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Abstract 



Many database applications perform complex data retrieval and update tasks. Nested 
queries, and queries that invoke user-defined functions, which are written using a mix of 
procedural and SQL constructs, are often used in such applications. A straight-forward 
evaluation of such queries involves repeated execution of parameterized sub-queries or 
blocks containing queries and procedural code. Repeated execution of queries and updates 
also happens when external batch programs call database stored procedures repeatedly 
with different parameter bindings. 

Iterative execution of queries and updates is often inefficient due to lack of opportu- 
nities for sharing of work, random 10, and network round-trip delays. Query decorrelation 
is an important technique which addresses the problem of iterative evaluation of nested 
queries, by rewriting them using set operations such as joins and outer-joins. Thereby, 
decorrelation enables the use of set-oriented plans with reduced random 10, which are 
often more efficient than the alternative iterative plans. However, decorrelation is not 
applicable to complex nested blocks such as user-defined functions and stored procedures. 

The focus of this thesis is to develop query evaluation, optimization and program 
transformation techniques to improve the performance of repeatedly invoked tasks such as 
parameterized database queries, updates, stored-procedures and user-defined functions. 

To do so, we first propose enhancements to iterative query execution plans which 
improve their efficiency by exploiting sorted parameter bindings and state retention. For 
several queries, even when decorrelation is applicable, an iterative plan can be the most 
efficient alternative. Hence, speeding up the execution of iterative plans and their inclusion 
in the optimizer's search space of alternative plans is important. We show how to extend 
a cost-based query optimizer so that the effects of sorted parameter bindings and state 
retention of plans are taken into account. 

An important problem that arises while optimizing nested queries as well as queries 
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with joins, aggregates and set operations is the problem of finding an optimal sort order 
from a factorial number of possible sort orders. Our second contribution is to show that 
even a special case of this problem is NP-Hard, and present practical heuristics that are 
effective and easy to incorporate in existing query optimizers. 

We then consider iterative execution of queries and updates inside complex procedu- 
ral blocks such as user-defined functions and stored procedures. Parameter batching is an 
important means of improving performance as it enables set-orientated processing. The 
key challenge to parameter batching lies in rewriting a given procedure /function to pro- 
cess a batch of parameter values. Our third contribution is a solution, based on program 
analysis and rewrite rules, to automate the generation of batched forms of procedures and 
replace iterative database calls within imperative loops with a single call to the batched 
form. 

We present experimental results for all the proposed techniques, and the results show 
significant gains in performance. 

Keywords: Query optimization, Nested queries, Stored procedures, User-defined func- 
tions, Program transformation, Parameter batching. 
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Chapter 1 
Introduction 



Database applications with complex queries have become commonplace. For example, 
nested queries, queries containing complex joins and grouping, and queries that make 
use of procedural extensions to SQL are frequently encountered in database applications. 
Applications also make use of stored procedures, which use a mix of procedural language 
constructs and SQL. 

In such applications, queries and updates to a database are often executed repeatedly, 
with different values for their parameters. Repeated invocations of queries and updates 
can occur due to several reasons. For example, consider a nested query in which a sub- 
query or inner query would be nested below an outer query block. Example 11.11 shows 
a nested query on the TPC-H schema [55]. The query retrieves orders none of whose 
lineitems were shipped on the day the order was placed. 

Example 1.1 A Nested Query 

SELECT o-orderkey, o_orderdate FROM ORDERS 
WHERE o-orderdate NOT IN (SELECT Lshipdate FROM LINEITEM 

WHERE Lorderkey = o_orderkey); 



Sub-queries can use parameters whose values are bound by the outer query block as 
illustrated in Example 11.11 The default way of executing nested queries is to iteratively 
invoke the sub-query for each parameter binding produced by the outer query block. Sim- 
ilarly, when a query makes use of a user-defined function (UDF) in its WHERE/SELECT 
clause, the UDF will be invoked multiple times with different values for its parameters. 
As a result, queries used inside the UDF get executed repeatedly. External programs 
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that perform batch processing are another important reason for repeated invocation of 
queries/updates. Such programs call database stored-procedures repeatedly by iterat- 
ing over a set of parameters, and as a result, cause repeated invocations of queries and 
updates contained in the body of the stored-procedure. Application programs, stored- 
procedures and user-defined functions can also make explicit use of looping constructs, 
such as FOR/WHILE/CURSOR loops, and invoke queries/updates inside the loop. 



1.1 Problem Overview and Motivation 

Iterative execution plans for queries and updates are often very inefficient and result in 
poor application performance. Lack of opportunity for sharing of work (e.g., disk 10) 
between multiple invocations of the query/update is a key reason for the inefficiency of 
iterative plans. Random disk 10 and network round-trip delays also degrade the per- 



formance of iterative plans. Consider the query shown in Example 11.11 A naive nested 
iteration plan for the query would invoke the sub-query on the LINEITEM table for every 
tuple of the ORDERS relation. If a useful index is not present to answer the sub-query, 
the LINEITEM table would be scanned once for each invocation of the sub-query. In 
practice, an index on the O-orderkey column is expected to exist and the sub-query can 
be evaluated with an index lookup. However, repeated index lookups result in a large 
number of random IOs and can lead to poor performance. 

An important technique used to address poor performance of nested queries is query 
decorrelation, also known as query unnesting[3TJ EHl ESI El ESI HH1 EZl E2] ■ Query decorre- 
lation aims at rewriting a given nested query as an equivalent non-nested query by making 



use of set operations such as joins and outer-joins. Example [O 
form of the query in Example 11.11 



shows a decorrelated 



Example 1.2 A Decorrelated Form of the Nested Query in Example ] 1. II 
SELECT O-orderkey, o_orderdate 

FROM ORDERS ANTI SEMI JOIN LINEITEM ON 

0-orderdate=Lshipdate AND o-orderkey=Lorderkey; 



1 Standard SQL does not provide a construct for anti semi-join. The syntax used here is merely to 
illustrate the internal representation after decorrelation 
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By rewriting nested queries using set operations, decorrelation expands the space 
of alternative execution strategies, which otherwise would be restricted only to iterative 
execution plans. The query optimizer, can now consider set-oriented strategies, such as 
hash-join and merge-join to answer a nested query. The query optimizer estimates the 
cost of alternative plans, including the iterative execution plans, and chooses the one with 
the least expected cost. 



Example 1.3 A Query with UDF Invocation 



SELECT orderid FROM sellorders 

WHERE mkt='NSE' AND count_offers (itemcode, amount, curcode) > 0; 

INT count_offers(INT itemcode, FLOAT amount, VARCHAR curcode) 
DECLARE 

FLOAT amount-usd; 
BEGIN 

IF (curcode == "USD") 

amount_usd := amount; 

ELSE 

amount_usd := amount * (SELECT exchrate FROM curexch 

WHERE ccode = curcode); // (ql) 

END IF 

RETURN (SELECT count(*) FROM buyoffers 

WHERE itemid = itemcode AND price >= amount-usd); // (q2) 

END; 



Although decorrelation techniques are applicable for a wide variety of nested queries, 
iterative execution of queries and updates can still occur due to the following reasons: 

1. For several queries, even when decorrelation is applicable, an iterative execution 
plan might be the most efficient of the available alternatives [22], HI] . 

2. Known decorrelation techniques are not applicable for complex nested blocks such 
as those containing procedural code. For example, consider the query shown in 
Example 11.31 The user-defined function count-offers used in the where clause of 
the query forms a nested block with procedural code and sub-queries inside it. 
Decorrelation techniques proposed till date are not applicable to such queries, except 
in special cases when the body of the UDF is very simple. 

3. Decorrelation is also not applicable when queries and updates are invoked from 
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imperative loops (e.g., a while loop) of external programs and user-defined functions 
or stored procedures. 

The focus of this thesis is to develop query evaluation, optimization and program 
transformation techniques to improve the performance of repeatedly invoked parameter- 
ized sub-queries, updates, stored-procedures and user-defined functions. To this end we 
make the following contributions. 

1.2 Summary of Contributions 

1. We propose new query evaluation techniques, which make use of sort order of pa- 
rameter bindings and state retention across calls, to speed up the evaluation of 
nested queries. These alternative techniques augment the optimizer's plan space 
with improved nested iteration plans. We show how to extend a cost-based query 
optimizer so that the effects of sorted parameter bindings and state retention are 
taken into account. 

2. We address the problem of choosing optimal sort orders during query plan genera- 
tion. The problem of choosing optimal sort orders is important not only for nested 
queries, but also for queries containing joins, aggregates and other set operations. 

3. We address the problem of efficient evaluation of repeatedly called user-defined 
functions and stored-procedures, which contain SQL queries and updates embedded 
within procedural code. We present an approach, based on program analysis and 
transformation, to automatically generate batched forms of procedures. Batched 
forms of procedures work with a set of parameter bindings and thereby enable set- 
oriented processing of queries/updates inside the procedure body. 

We provide details of each of these contributions, below. 

1.2.1 Improved Iterative Execution with Parameter Sorting 

Parameterized sub-queries being pure (side-effect free) functions, it is possible to reorder 
their invocations without affecting the results. Ordered (sorted) parameter bindings pro- 
vide several opportunities for speeding up the sub-query evaluation. For example, it is 
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known [22] that ordered parameter bindings reduce random disk access and yield better 
cache hit ratio. We propose additional query evaluation techniques, which exploit sorted 
parameter bindings by retaining state across calls. 

Restartable Scan: Nested sub-queries or queries inside user-defined functions often select 
tuples from a relation, one of whose columns matches the parameter value. The selection 
predicate involving the parameter is called correlation predicate. In the query of Exam- 
ple ll.H the selection predicate Lorderkey=o-orderkey in the sub-query is a correlation 
predicate. A relation referenced by the sub-query is called an inner relation. If the inner 
relation is stored sorted (clustered) on the column appearing in the correlation predi- 
cate, making the sub-query invocations in the sorted order of parameter values allows the 
following: at the end of each invocation, the scan for the inner relation can remember 
its position in the relation and restart from that point in the next invocation. Thus, 
the restartable scan enables us to achieve the efficiency of set-oriented algorithms like 
merge-join to situations where a merge-join is not directly applicable. 

Incremental Computation of Aggregates: Nested queries where the sub-query computes 
an aggregate value are often encountered in practice and are known as nested aggregate 
queries. For nested aggregate queries having non-equality correlation predicates (<,<,> 
or >), known query processing strategies are very inefficient. We describe a strategy, 
which by employing a combination of restartable scan and a state retaining aggregate 
operator, computes the result of the nested aggregate query very efficiently. 
Plan Generation: While considering alternative plans for a given query, it is imperative 
that the query optimizer must take into account query execution plans that exploit ordered 
parameter bindings and estimate their cost appropriately. We present a cost-model of 
operators, which takes into account the effect of sorted parameter bindings. We then 
address the problem of extending a cost-based query optimizer to take into account state 
retention of operators and the effect of sorted parameter bindings. We have implemented 
the proposed ideas and present experimental results, which show significant benefits for 
several classes of queries. 

A key challenge in optimizing queries by taking parameter sort orders into account, 
is to decide the optimal sort order of parameters. This problem is important not only for 
nested queries, but also for queries containing joins, aggregates and other set operations. 
We describe it next. 
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1.2.2 Choosing Sort Orders in Query Optimization 

For a given set of sub-query parameters, several sort orders are possible. Different sort 
orders can result in different plan costs and the query optimizer must make a judicious 
choice of sort orders to use. Sort orders, in general, play an important role in query 
processing. Algorithms that rely on sorted inputs are widely used to implement joins, 
grouping, duplicate elimination and other set operations. The notion of interesting or- 
ders [48J has allowed query optimizers to consider plans that could be locally sub-optimal, 
but produce ordered output beneficial for other operators, and thus be part of a globally 
optimal plan. However, the number of interesting orders for most operators is factorial in 
the number of attributes involved. For example, all possible sort orders on the set of join 
attributes are of interest to a merge-join. Considering the exhaustive set of sort orders is 
prohibitively expensive as the input sub-expressions must be optimized for each of these 
sort orders. The following factors make the problem of choosing sort orders non-trivial. 

• Clustering and secondary indices that cover the query (an index is said to cover a 
query if it includes all the attributes required to answer the query) make it possible 
to produce some sort orders at much lesser cost than others. 

• Partially available sort orders can greatly reduce the cost of intermediate sorting 
step. For example, if the input is sorted on attribute a, it is more efficient to obtain 
the sort order (a, b) as compared to the sort order (b, a). 

• Attributes common to multiple joins, group-by and set operations must be taken 
into account for choosing globally optimal sort orders. For example, consider the 
following query. 

SELECT R.a, S.b, T.c FROM R, S, T 

WHERE R.a = S.a AND R.b = S.b AND S.b = T.b AND S.c = T.c 
GROUP BY R.a, S.b, T.c; 

A good query execution plan makes a coordinated choice of sort orders for the two 
joins and the group-by, so that the cost of intermediate sorting steps is minimized. 

• Binary operators like merge-join, can accept any sort order on the attributes in- 
volved, but require a matching sort order from their two inputs. 
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Previous work on sort orders has mainly focused on inferring sort orders from func- 
tional dependencies and predicates in the input sub-expression. Simmen et.al. [5T] high- 
light the importance of choosing good sort orders for operators like group-by, which have 
flexible order requirements. But their approach cannot be used for binary operators such 
as merge-join, which require a matching sort order from their two inputs. 

We show that even a simplified version of the problem of choosing sort orders is NP- 
Hard. We then give an approach to address the general case of the problem of choosing 
sort orders. 

1. We introduce the notion of favorable orders, which characterizes the set of sort 
orders easily producible on the the result of an expression. The notion of favorable 
orders allows us to consider promising sort orders during plan generation. 

2. We show how a cost-based query optimizer can be extended to identify favorable 
orders and make use them to choose good sort orders during plan generation. Our 
optimizer extensions also take into account plans that may not completely produce 
a required sort order but produce only a part (prefix) of the required sort order. 
When a sort order is partly available, a partial sort operator is used to obtain the 
complete (desired) sort order. The partial sort operation uses a modified version of 
the standard external-sorting algorithm. 

3. We introduce a plan refinement phase in which the sort orders chosen during plan 
generation are further refined to take into account attributes common to different 
operators, and thus reduce the cost of intermediate sorts further, when possible. 

We have implemented the proposed techniques in a cost-based query optimizer, and 
we carry out a performance comparison of the plans generated by our optimizer with 
those of two widely used database systems. The results show performance benefits up to 
50%. 

1.2.3 Rewriting Procedures for Batched Bindings 

Several data retrieval and update task need more expressive power than what standard 
SQL offers. Therefore, many applications perform database queries and updates from 
within procedural application code. Database systems also support stored procedures 
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and user-defined functions (UDFs), which can use a mix of procedural constructs and 
SQL. Repeated execution of UDFs and stored procedures can occur if queries make use 
of UDFs in their WHERE or SELECT clause, or if a stored procedure is invoked from 
external batch processing applications. In Example ll.3[ the function counter-offers gets 
invoked for every tuple in the sellorders table. This in turn results in multiple invocations 
of queries inside the function body. Known decorrelation techniques do not apply to such 
cases and hence most database systems are forced to choose iterative execution plans. 

An important technique to speed up repeated invocation of such procedures/functions 
is parameter batching. Parameter batching allows the choice of efficient set-oriented plans 
for queries and updates. For example, the the batched form of the UDF count-offers would 
take a set of triplets (itemcode, amount, curcode) and return a set comprising of the func- 
tion's results for each triplet. By working with a set of parameters, the batched form 
can avoid repeated calls to the queries contained inside the function body. For instance, 
the batched form of the UDF count-offers, can issue a single join query to obtain the 
exchange rates for all the required currencies. Assuming the set of parameters to the 
UDF is available in a temporary relation pbatch, the query issued by the batched form of 
count_offers can be as follows: 

SELECT pb. curcode, cx.exchrate FROM curexch cx, pbatch pb 
WHERE cx.ccode = pb. curcode; 

The two key challenges in exploiting batching lie in developing techniques to 

1. automatically rewrite a given procedure to work with a batch of parameters bindings. 
In other words, to generate the batched form of a given procedure. It is possible for 
an application developer to manually create the batched form of a procedure, but 
the process is time consuming and error prone. 

2. automatically rewrite a program, which iteratively invokes a database procedure 
from within an imperative loop, such as a while loop, so that the rewritten program 
makes a single invocation of the batched form of the procedure. 

We present an approach, based on program analysis, to automate the generation 
of batched forms of procedures and replace iterative calls by code to create parameter 
batch and then call the batched form. The approach comprises of a set of program 
transformation rules, which make use of conditions on the data dependence graph of 
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the given program. The data dependence graph gives information about various types 
dependencies between program statements and is obtained through program analysis. 

To the best of our knowledge no previously published work addresses the problem of 
rewriting procedures to accept batched bindings. Lieuwen and DeWitt [31] consider the 
problem of optimizing set iteration loops in database programming languages. The goal of 
their work is to convert nested set iteration loops arising in object-oriented database lan- 
guages into joins, and they propose a program transformation based approach to achieve 
this. The rewrite rules proposed in our thesis share some similarities with their work, but 
are designed to address the problem of batching database calls made from programs writ- 
ten in general procedural languages. Our rewriting technique can be used with complex 
programs, written using a rich set of language constructs such as set-iteration (or cursor) 
loops, while loops, control flow statements (if-then- else) and assignment statements. Our 
rewrite rules make use of some of the techniques, such as loop splitting, known in the 
context of parallelizing compilers [29J . 

In many situations, the inter-statement data dependencies within a program may 
not permit set-oriented evaluation of a specific query inside the program body. We show 
that by appropriately reordering the program statements and by introducing temporary 
variables, we can enable set-oriented evaluation in large number of cases. 

Our rewrite rules can conceptually be used with any language. We have prototyped 
the rewrite techniques for a subset of Java. We present our performance study on exam- 
ples taken from three real-world applications. The results are very promising and show 
performance benefits up to 75% due to the proposed rewriting techniques. 

1.3 Organization of the Thesis 

This thesis is organized as follows. Chapter [2] describes our work on optimizing nested 
queries by exploiting parameter sort orders. This is followed by Chapter [3], which ad- 
dresses the problem of choosing interesting sort orders. Chapter H] addresses the problem 
of parameter batching for procedure calls. Chapter concludes the thesis with some 
comments on possible directions for future work. 
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Chapter 



2 



Iterative 



Plans with Parameter 



Sorting 



In many cases of iterative execution, such as those involving sub-queries or functions, the 
order of calls to the nested sub-query or the function does not affect the end result. Or- 
dering the query /function invocations on the parameter values gives several opportunities 
for improving the efficiency For example, System R[l8] caches and reuses inner sub-query 
results for duplicate parameter bindings, and it uses sorted parameter bindings so that 
only a single result of the inner sub-query is required to be held in memory at any given 
time. Graefe [22] emphasizes the importance of nested iteration plans, and highlights 
sorting of outer tuples as an important technique to improve buffer effects. 

This chapter describes additional query evaluation techniques that exploit sort order 
of parameter bindings. The task of a query optimizer is to consider alternative plans 
for a given query and choose the best, i.e., the least cost plan. The optimizer must 
therefore consider iterative plans that exploit sort order of parameters, and estimate their 
cost appropriately. The second part of this chapter address the problem of extending a 
cost-based query optimizer to consider iterative plans exploiting parameter sort orders. 

The rest of this chapter is structured as follows. After stating some basic definitions 
in Section I2TT1 in Section I2T21 we illustrate how to make use of ordered parameter bindings 
to improve the efficiency of iterative query execution plans. Section 12.31 illustrates how 
a Volcano style cost-based optimizer [23] can be extended to consider the proposed tech- 
niques, and Section 12.41 presents experimental results and analysis. We discuss related 
work in Section I2.5[ and summarize our work in Section 12.61 
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2.1 Definitions 



Definition 2.1 Sort Order 

Let s be a relational schema having n attributes, Oi, . . . , o n . A sort order o on schema s 
is an ordered sequence of attributes from any subset of {cti, . . . , a n }. 

We denote sort orders by enclosing the sequence of attributes in parentheses as in (oi, a 4 , a 5 ) 
A sort order s = (a s i, a s2 , . . . , a s k) is said to hold on a sequence of tuples conforming to 
schema s, if the sequence has the tuples sorted on attribute a s \ and for a given value 
of a si the tuples are sorted on a s2 , and so on up to a sk . Note that we ignore the sort 
direction (ascending/descending) in our description. Sort direction can be represented by 
using the complement notation, e.g., (01,02) can be used to represent a sequence sorted 
in non-decreasing order on attribute 01 and then in non- increasing order on attribute a 2 . 
The techniques we propose in this chapter are applicable independent of the sort direction, 
and hence we omit the sort direction in all our description. 

Definition 2.2 Subsumption of Sort Orders 

Sort order Oi is said to subsume sort order o 2 iff o 2 is a prefix of Oi . 

For example, sort order (a, b, c) subsumes sort order (a, 6). Note that the subsumption 
relation forms a partial order on the set of sort orders. 

2.2 Query Evaluation with Ordered Parameters 

Query evaluation algorithms for standard relational operators, such as selection, join and 
grouping are well studied [20]. For example, a relational selection can be implemented as 
a table scan or an index lookup, and a relational join can use hashing, sorting or repeated 
index lookups (index nested loops join). A query execution plan comprises of operators, 
each of which implements such an algorithm. In the case of iterative execution of query 
plans, the operators are initialized with a different parameter in each iteration and then 
executed, which gives no opportunities for reordering or sharing of disk 10. In this section, 
we illustrate how the standard query evaluation techniques can be extended for improved 
performance by making use of the sort order of query parameters. 
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2.2.1 Rest art able Table Scan 



Nested sub-queries or queries inside user-defined functions often select tuples from a rela- 
tion, one of whose columns matches the parameter value. The selection predicate involving 
the parameter is called the correlation predicate. In the query of Example ll.il the selection 
predicate Lorderkey =o_orderkey in the sub-query is a correlation predicate. A relation 
referenced by the sub-query is called an inner relation. In Example 11.11 LINEITEM is an 
inner relation. 

Consider an inner relation, which is stored sorted on the column that appears in an 
equality correlation predicate. For instance, if the LINEITEM table in Example 11.11 has 
a clustering index on the Lorderkey column, it would be stored sorted on the Lorderkey 
column. Now, if the sub-query invocations are made in the order of the parameter values, 
we can employ a restartable table scan for the inner relation. The restartable table scan 
works as described next. In the following description we assume the parameter bindings to 
be duplicate free. For the first value of the parameter, the scan starts from the beginning of 
the relation (or the first matching tuple in the clustering index) and returns all the records 
that match the equality correlation predicate. The scan stops on encountering a record 
that does not satisfy the correlation predicate. The scan remembers this position. For 
subsequent bindings of the parameter, the scan continues from the remembered position, 
i. e., the position at which it left off in the previous binding. This allows the complete query 
to be evaluated with at most one full scan of the inner relation. In the above explanation, 
we assumed the parameter bindings to be duplicate free. Duplicate parameter values 
can be handled in two ways: (a) Cache the sub-query result and reuse it for subsequent 
invocations with the same parameter value, thus avoiding re-execution of the subquery 
with the same parameter values. When the parameter values are sorted, at most one 
result needs to be cached at any given time |48j . and (b) remember the most recent 
parameter value (v) and two positions segstart and segend for the inner relation scan 
- segstart positioned at the first tuple which matched the parameter value and segend 
positioned at the last tuple that matched the parameter value. If a new call has the same 
parameter value (v), continue the scan from segstart, otherwise continue the scan from 
segend. In general, the approach of caching the sub-query result is more efficient unless 
the subquery result for each invocation is too large to fit in memory. 

Apart from clustering index, query covering secondary indices also allow the use of 
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restartable table scan. An index is said to cover a query, if the index leaf pages contain 
all the columns required to answer the query. By having additional columns (columns in 
addition to the index key) in the index leaf pages, the index supports reading tuples in the 
index key order without incurring random 10 to fetch data pages. Thus query covering 
indices make it possible to efficiently obtain alternative sort orders for the same relation, 
and are being used increasingly in read intensive applications. 

Cost Model 

Given an iterative plan, traditional query optimizers estimate the plan cost as follows. 
The plan cost for the inner and outer sub-plans are computed independently by adding 
the individual operator costs. The cost of the inner sub-plan is then multiplied by the 
estimated number of times it is invoked, which is the number of distinct parameter values 
bound from the outer query block. With operators such as restartable scan, which retain 
state across calls, such a model of computing the plan cost cannot be used. The solution 
is to cost a plan for n invocations at once. Accordingly, each operator in the plan must 
have a cost function, which takes into account the number of invocations. 

Thus, the cost of restartable scan for n invocations with parameterized selection 
predicate p on relation r having B r disk blocks is computed as follows. 

{B r if r is sorted to support p 
INF otherwise, (the operator cannot be used) 

On the other hand, if a plain relation scan were to be employed, the relation would 
be scanned n times over all the iterations, amounting to a cost of n x B r . 

A plan that employs ordered parameter bindings and restartable scan has the fol- 
lowing advantages over a plan that employs naive iterative index lookups. 

1. Performs sequential reads and hence incurs reduced seek time and permits prefetch- 
ing of disk blocks. 

2. If more than one record from the same data page are needed, parameter sorting 
guarantees that the page is accessed exactly once irrespective of the buffer replace- 
ment policy. 
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However, if a query requires very small number of tuples from the inner relation, an 
index lookup plan is generally more efficient than the restartable scan. In such cases the 
index lookup plan avoids reading most of the relation while the restartable scan has to 
perform a complete scan of the relation. The iterative index lookup plan can however 
benefit from sorted parameter bindings as we shall see in Section 12.2.21 

The effect produced by a restartable scan is similar to that of a merge join. In 
essence, the restartable scan extends the benefits of merge-join to iterative plans. This 
is important since merge-join is not directly applicable to complex nested blocks such as 
user-defined functions with embedded queries. 

2.2.2 Clustered Index Scan with Parameter Sorting 

If a clustering index exists for the inner relation on the column that is involved in the 
correlation predicate and if the query requires a small number of tuples from the inner 
relation, it is often more efficient to employ iterative index lookups as against a restartable 
scan. However, a naive iterative index lookup plan leads to random 10. Performance of 
clustered index lookups in the evaluation of correlated nested queries can be greatly 
improved by producing the parameter bindings in sorted order [22]. Sorting ensures 
sequential I/O and therefore permits prefetching. Further, sorting of parameters ensures 
each data page is fetched at most once irrespective of the buffer replacement policy. In 
this section, we derive a cost model for iterative clustered index lookups with sorted keys. 
An accurate cost-model, which takes into account the benefits of parameter sorting, is 
essential for the optimizer to pick the overall best plan. 

Cost of Clustered Index Lookups with Sorted Parameters 

For ease of illustration, we assume the outer query block references a single relation R 
and the inner block references a single relation S. We assume the following statistics are 
available for the optimizer. 

• Number of blocks occupied by the outer relation = B r 

• Number of tuples in the outer relation = N r 

• Number of blocks occupied by the inner relation = B s 

• Number of tuples in the inner relation = N s 

• Tuples per block for the inner relation = F s 
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• Number of distinct values for the attribute of S involved in the correlation predicate = d. 

• Number of inner relation tuples that match each value of the correlation variable = C s . 
Assuming uniform distribution, C s = N s /d. 

• Combined selectivity of all the simple predicates in the outer block = S D 

• t t : Transfer time for a block, default value 0.1 msec for a 4K block at 40 MB/s 

• t s : Seek time, default value 4 msec 

Cost Estimate Without Sorting 

When the correlation bindings that act as the lookup keys for the clustered index are not 
guaranteed to follow any order, each record fetch can potentially require a disk I/O. The 
number of records from the inner relation that match each correlation binding is C s . Since 
the inner relation is clustered on the lookup column the records to be fetched would be 
stored contiguously, and hence occupy \C S /F S ] contiguous blocks. Let k be the average 
number of cache misses on index nodes for each lookup. Then the estimated cost of each 
lookup and fetch is given by: 

Q = t t (\C s /F s ] +k)+t s (k + l) 

The total estimated cost (across all the iterations) would thus be N r x S x Ci. 
Cost Estimate With Sorting 

We consider two cases: 

1. When the outer predicate is such that the correlation bindings contain all the values 
in an interval of the index (e.g., an outer predicate on the correlation attribute, 
which is also a foreign key of the inner relation), the inner relation records accessed 
over all the iterations lie on a set of contiguous blocks. Thus multiple lookups can 
be served from a single block fetch. The total number of records from the inner 
relation accessed over all the iterations will be A s — N r x S x C s . As the records 
are stored contiguously, the total inner relation access cost will be: t t (\A s /F s ]) + t s . 

Assuming N r = 100, 000, S D = 0.5, C s = 10 and F s = 100, the expected num- 
ber of inner relation's blocks accessed (across all iterations) with sorted correlation 
bindings will be 5000, and we pay only a transfer cost for each. On the other hand 
the expected number of blocks accessed without sorting will be 50,000 (ten times 
higher) even with no cache misses for the index pages (i.e., k — 0). When the 
correlation bindings are not sorted multiple fetches can occur for the same block of 
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the inner relation due to the interleaved access with other blocks. This is the reason 
for the higher estimated cost (which assumes a worst case cache behavior) when the 
correlation bindings are not sorted. 

2. When the predicates of the outer query block are on attributes different from the 
ones used as correlation variables, the correlation bindings will be from disjoint 
intervals. Let j be the expected number of times the inner query block is evaluated 
(j is the number of distinct correlation bindings generated from the N r x S Q tuples 
that qualify the outer predicates). Let q = \F S / C s ] . q denote the number of distinct 
values for the attribute of S involved in the correlation predicate, which are stored 
on each block. Recall that d is the total number of distinct values for the attribute 
of S that is involved in the correlation predicate. We can estimate the number 
of inner relation's blocks accessed as follows: a block of the inner relation will be 
accessed if any of the q distinct values in it is part of the j distinct correlation 
bindings. Therefore, the probability that a block of S gets accessed is given by 
p — (1 d^r 1 ), where m C n is the notation for choosing n from m, i.e., ( m ^) !n! - 

Therefore, the expected number of blocks read, numJblocks will be: p x B s . 

The cost estimate will thus be min(scantime, (t s + t t )num Mocks) where scantime 
is the time to scan the complete table (t s + B s x t t ). 

2.2.3 Incremental Computation of Aggregates 

We now describe an efficient technique to evaluate nested aggregate queries having non- 
equality correlation predicates, using the restartable scan. Decorrelation is often very 
expensive for such queries. Consider the SQL query shown in Example 12.11 The query 
lists days on which the sales exceeded the sales seen on any day in the past. 

Example 2.1 A Nested Aggregate Query with Inequality Predicate 

SELECT day, sales FROM DAILYSALES DS1 
WHERE sales > (SELECT MAX (sales) FROM DAILYSALES DS2 
WHERE DS2.day < D SI. day); 



A naive nested iteration plan for the above query employs a sequential scan of the 
DAILYSALES table for both the outer and the inner block. Assuming the inner block 
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scans an average of half of the table for each outer tuple, the cost of this plan would 
be t t (B ds + N ds x B ds /2) + t s (l + N ds ), where B ds is the number of blocks occupied by 
DAILYSALES table, N ds is the number of tuples in the same table, and t t and t s are the 
block transfer time and seek time respectively. 

Now, suppose the DAILYSALES relation (or materialized view) is stored, sorted 
on the day column. If the plan for the outer query block generates the bindings for 
the correlation variable (DAILYSALES. day) in non- decreasing order, we can see that the 
tuples that qualify for the aggregate (MAX) operator's input in the i th iteration will be a 
superset of the tuples that qualified in the {i — l) th iteration. The MAX operator, in its 
state, can retain the maximum value seen so far and use it for computing the maximum 
value for the next iteration by looking at only the additional tuples. So, the scan needs to 
return only those additional tuples that qualify the predicate since its previous evaluation. 
The technique proposed here is applicable for <, <, > and > predicates and the aggregate 
operators MIN, MAX, SUM, AVG and COUNT. The maximum cost of such a plan would 
be 2 x B ds x t t + 2 x t s , which is significantly lesser than the cost of the naive nested 
iteration plan. 

When there are GROUP BY columns specified along with the aggregate, the aggre- 
gate operator has to maintain one result for each group. The aggregate operator can 
maintain its state in a hash table; the key for the hash table being the values for the 
GROUP BY columns and the value against each key being the aggregate computed so far 
for the corresponding group. 

2.3 Parameter Sort Orders in Query Optimization 

A query optimizer considers alternative execution plans for a given query, estimates the 
cost of each plan and chooses the plan with the least expected cost. To estimate the cost 
of an iterative plan, traditional optimizers first identify the best plan for the nested sub- 
query independently and multiply its cost by the expected number of iterations. Clearly, 
this approach does not take into account plans that exploit ordered parameter bindings. 
The optimizer must consider different sort orders on the sub-query parameters. For each 
sort order, there is an associated benefit for the sub-query plan that exploits the sort order 
and a cost for the outer query plan to generate the parameters in the required order. The 
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optimizer must choose the optimal plan for the query by considering benefits and cost of 
various possible sort orders. 

The Volcano query optimization framework [23J is a state of the art cost-based query 
optimizer. This section illustrates how a Volcano style query optimizer can be extended 
to take into account plans that make use of parameter sort orders. The rest of this 
section is organized as follows. Section 12.3.11 briefly describes the Volcano optimizer 
framework. Section 12.3.21 proposes extensions to the optimizer's high-level interface to 
support optimization of parameterized expressions. Section 12.3.31 describes the logical 
representation we adopt for nested queries. The changes to the to plan space and search 
algorithm are described in Section 12.3.41 and Section 12.3.51 respectively. 

2.3.1 The Optimizer Framework 

In this section we briefly describe the PYRO cost-based optimizer framework over which 
we propose our extensions. PYRO is an extension of the Volcano optimizer [23] . A detailed 
description of the PYRO optimizer can be found in in (35] and |47| . 
The optimizer performs three main tasks. 

1. Logical Plan Space Generation 

In this first step the optimizer, by applying logical transformations such as join asso- 
ciativity and pushing down of selections through joins, generates all the semantically 
equivalent rewritings of the input query. 

2. Physical Plan Space Generation 

This step generates several possible execution plans for each rewriting produced 
in the first step. An execution plan specifies the exact algorithm to be used for 
evaluating each logical operator in the query. Apart from selecting algorithms for 
each logical operation, this step also considers enforcers that help in producing 
required physical properties (such as sort order of tuples) on the output. Physical 
property requirements arise due to two reasons (i) the user/application may specify 
a physical property requirement as part of the query, e.g., an order-by clause and (ii) 
algorithms that implement operations such as join and duplicate elimination may 
require their inputs to satisfy a sort order or grouping property. The algorithms for 
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relational operators and the enforcers of physical properties are collectively referred 
to as physical operators as against the logical operators of the logical plan space. 



3. Finding the Best Plan 

Given the cost estimates of different algorithms that implement the logical opera- 
tions and the enforcers, the cost of each execution plan is estimated. The goal of 
this step is to find the plan with minimum cost. 



The above three steps can either be executed in a depth-first order or in a breadth- 
first order [17] . For the purpose of our explanation we consider the breadth-first order, 
in which each step is performed completely before the next step is started. However, 
in our actual implementation, physical plan generation and search for the best plan are 
combined in a single phase. 

An AND-OR graph representation called Logical Query DAG (LQDAG) is used to 
represent the logical plan space, i.e., all the semantically equivalent rewritings of a given 
query. The LQDAG is a directed acyclic graph whose nodes can be divided into equivalence 
nodes and operation nodes; the equivalence nodes have only operation nodes as children 
and the operation nodes have only equivalence nodes as children. An operation node 
in the LQDAG corresponds to an algebraic operation, such as join (n) or select (er). It 
represents the expression defined by the operation and its inputs. An equivalence node 
in the LQDAG represents the equivalence class of logical expressions (rewritings) that 
generate the same result set, each expression being defined by a child operation node of 
the equivalence node and its inputs. An example LQDAG is shown in Figure liOf . 




AC 



A 




B 




C 



Figure 2.1: A Logical Query DAG for A n B n C 



^^This figure is taken from [47j . 
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Once all the semantically equivalent rewritings of the query are generated, the Vol- 
cano optimizer generates the physical plan space by considering different algorithms for 
each logical operation and enforcers to generate required physical properties. In some 
optimizers, such as Cascades [21] and SQL Server [IB] , the logical and physical plan space 
generation stages are intermixed. The physical plan space is represented by an AND-OR 
graph called PQDAG which is a refinement of the LQDAG. Given an equivalence node 
e in the LQDAG, and a physical property p required on the result of e, there exists an 
equivalence node in the PQDAG representing the set of physical plans for computing the 
result of e with the physical property p. A physical plan in this set is identified by a child 
operation node of the equivalence node and its input equivalence nodes. The equivalence 
nodes in a PQDAG are called physical equivalence nodes to distinguish them from the 
logical equivalence nodes of the LQDAG. Similarly, the operation nodes in a PQDAG are 
called physical operation nodes to distinguish them from the logical operation nodes of the 
LQDAG. 

The optimizer framework we use models each of the logical operators, physical oper- 
ators and transformation rules as separate classes, and this design permits the extensions 
we propose to be easily incorporated. 

2.3.2 Extensions to the Optimizer Interface 

Both Volcano [23] and PYRO [17] optimizers take the initial query (expression), a set of 
physical properties (such as sort order) required on the query result and a cost limit (the 
upper bound on plan cost) as inputs and return the execution plan with least expected 
cost. The following met hod- signature summarizes the Volcano optimizer's input and 
output. 

Plan FindBestPlan (Expr e, PhyProp p, CostLimit c); 

The optimizer makes an assumption that if the expression is evaluated multiple times the 
cost gets multiplied accordingly. This assumption ignores advantageous buffer effects due 
to sorted parameter bindings and the benefits due to state retention techniques proposed 
in the previous section. When the parameter bindings are sorted, the cost of evaluating 
an expression n times can be significantly lesser than n times the cost of evaluating 
the expression once. In order to consider these factors, we propose a new form of the 
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FindBestPlan method. The following method-signature summarizes the new form of the 
FindBestPlan method. 

Plan FindBestPlan (Expr e, PhysProp p, CostLimit c, SortOrder pso, int callCount); 

The new FindBestPlan procedure takes two additional parameters. The first of these, 
is the sort order guaranteed on the parameters (outer variables) used by e. The second 
parameter, termed callCount, tells the number of times the expression is expected to be 
evaluated. The cost of the returned plan is the estimated cost for callCount invocations. 
Note that the original Volcano algorithm's interface can be thought of as a special case 
of this enhancement, where the expression e is assumed to have no unbound references 
(parameters) and the callCount is 1. 

2.3.3 Logical Representation of Nested Queries 

We now describe the way in which we represent nested queries in the LQDAG. A nested 
query, in the simplest case, contains two query blocks - an outer query block and an inner 
query block. The inner query block uses parameters whose values are bound from the 
tuples obtained by evaluating the outer query block. We adopt a variant of the Apply 
operator proposed in [T7j for representing nested queries. In its simplest form, the Apply 
operator has two sub-expressions: the bind sub-expression corresponds to the outer query 
block and the use sub-expression corresponds to the parameterized inner query block. 
Conceptually, the Apply operator evaluates the use sub-expression for every tuple in the 
result of the bind sub-expression. After each evaluation of the use sub-expression, the 
Apply operator combines the tuple from the bind sub-expression and the result of the use 
sub-expression. Combining may involve evaluating a predicate such as IN or NOT IN that 
check for set membership, EXISTS or NOT EXISTS that check for set cardinality, a scalar 
comparison (=,7^, >,>,<,<), or a comparison of a scalar with members of a set: relop 
ANY or relop ALL, where relop is one of the comparison operators. Figure 12.21 shows the 
logical representation of the query given in Example 11.11 of Chapter [TJ 

We refer to the bind sub-expression of an Apply operator as its left sub-expression 
and the use sub-expression as its right sub-expression. In general, an Apply operator 
can have multiple use expressions that represent multiple sub-queries nested at the same 
level. In a complex multi-level nested query a sub-expression e may use some variables 

22 




SI o_orderdate NOT IN 

use 



use expr 



Lshipdate 



'Lorderkey = o_orderkey 



ORDERS 



LINEITEM 



Figure 2.2: Example of Representing a Nested Query using Apply (A) 



and bind other variables. The variables that e binds may be passed on to the use sub- 
expressions of parent or ancestor Apply operators; e must be in the left-most subtree of 
such Apply operators. The variables that e uses must be defined at parent or ancestor 
Apply operators; e must be in a use-subtree, i.e., non- left-most subtree, of such Apply 
operators. 

2.3.4 Physical Plan Space Generation 

The physical plan space generation involves generating alternative execution plans for 
a given logical expression and representing them in the PQDAG. In PYRO, two query 
execution plans p\ and p 2 are considered equivalent (i.e., they belong to the same physical 
equivalence class) iff the following conditions are met: (i) p\ and p 2 evaluate the same 
logical expression e, and (ii) pi and P2 produce the result of e in the same sort order. 

We redefine the notion of equivalence of execution plans in PYRO to include the 
parameter sort orders required by the plans. Two plans p\ and p 2 belong to the same 
equivalence class iff pi and p 2 evaluate the same logical expression, guarantee the same 
physical properties on their output and require the same sort order on the parameter 
bindings, when invoked iteratively. Thus, for a given logical expression e and physical 
property p, there exists a set of physical equivalence nodes in the PQDAG. Each equiva- 
lence node in this set corresponds to a distinct sort order requirement on the parameters 
used in e. 

The physical plan space generation step therefore involves the following: given a 
logical equivalence node e, a physical property p required on the result of e and a sort order 
s known to hold on the parameters in e, generate all the evaluation plans and representing 
them in the PQDAG. The search phase of optimization then takes the PQDAG and a call 
count as its inputs and finds the best plan. 



23 



Generating Plans for Non-Apply Operators 

The plan generation step in PYRO works as follows. The LQDAG is traversed, and at each 

logical operation node, all the applicable algorithms (physical operators) are considered. 

A physical operator is applicable if it implements the logical operation and ensures the 

required physical properties on the output. 

Procedure ProcLogicalOpNode 

Inputs: o, a logical operation node in the LQDAG 

p, physical property required on the output 

s, sort order guaranteed on the parameter bindings 

e, the physical equivalence node for the new plans 
Output: Expanded physical plan space. New plans are created under e. 
BEGIN 

For each algorithm a such that a implements o, ensures physical property p on the output 
and requires a sort order s' on the parameters, where s' is subsumed by s 
Create an algorithm node o a under e 
For each input /' of o a 

Let Oj be the i th input (logical equivalence node) of o a 
Let pi be the physical property required from input / by algorithm a 
Set input /' of o a = PhysDAGGen(oj, pi, s) // Main plan generation procedure 

// shown in Figure fZol 

END 

Figure 2.3: Plan Generation at a Non-Apply Node 

To take into account the sort order of parameters, we need a minor modification to 
the plan generation step. For a given logical operation, we consider only those physical 
operators (algorithms) as applicable, whose sort order requirement on the parameters is 
subsumed by the sort order known to hold (guaranteed) on the parameters. As an exam- 
ple, consider a selection logical operator cr R1 a=Pl (Rl), where p\ is a correlation (outer) 
variable. Suppose two algorithms, a plain table scan requiring no sort order and a state 
retaining scan requiring a sort order (pi) on the parameters, are available. Now, if the 
parameter sort order guaranteed by the parent block subsumes (pi), both the algorithms 
(physical operators) are applicable. However, if the parameter sort order guaranteed by 
the parent block does not subsume (pi), then only the plain table scan is applicable. 
Figure 12.31 shows the plan generation steps at a Non-Apply logical operator. 

Plan Generation for an Apply Operator 

Earlier work on Apply operator [T7] attempts to rewrite the Apply operator using joins, 
outer-joins or semi-joins before plan generation. Our goal here is to expand the plan space 
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to include efficient iterative plans for the Apply operator. This involves two steps. 

1. Identify a set of useful and valid sort orders on the sub-query parameters. 

2. Generate iterative plans in which the plan for the sub-query makes use of the sort 
order of parameters produced by the plan for the outer-block. 

Identifying Valid Interesting Sort Orders 

If sub-query has n parameters, there are potentially n\ sort orders on the parameters. 
Considering all possible sort orders of parameters used in an expression is prohibitively 
expensive. Only few sort orders on the parameters are expected to be useful. To consider 
selected sort orders in the optimization process, System R [IS] introduced the notion of 
interesting orders. We extend this notion to sort orders of parameters and call them 
as interesting parameter sort orders. Our algorithm to generate the physical plan space 
creates physical equivalence nodes for only interesting parameter sort orders. 

Intuitively, interesting parameter sort orders are the ones that are useful for efficient 
evaluation of one or more nested blocks (use expressions of an apply operator). Typically, 
the clustering order and query covering indices on base relations used inside the nested 
block(s) decide the interesting parameter sort orders. However, the optimizer must also 
consider plans that explicitly sort a relation in the inner block to match the sort order 
easily available from the outer block. The problem of identifying interesting orders is 
common to both nested queries and joins and deserves special attention. We address this 
problem in Chapter [3j In the rest of this chapter we assume that the set of interesting 
sort orders on unbound parameters of an expression, is available to us. 

Every interesting sort order on the parameters may not be valid (feasible) under the 
given nesting structure of query blocks. For example, consider a query with two levels 
of nesting; q a (Qb(Qc)), la is the outer-most query block and q c is the inner most. Assume 
q c uses two parameters a and b, where a is bound by q a and b is bound by g&. Now, the 
sort order (a, b) is a valid interesting order for q c but not the sort order (b, a). As block 
q that binds parameter b is nested below the block q a that binds parameter a, the sort 
order (b, a) cannot be generated and hence invalid. 

Definition 2.3 Valid Interesting Sort Orders 

A sort order s = (aj., a2, ■ ■ ■ , a n ) on the parameters of a query block b is valid (feasible) iff 
there is a nesting of query blocks such that the following two conditions are satisfied: 
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1. level(di) < level(dj) for all i,j s.t. i < j, where level(a,i) is the level of the query 
block in which at is bound. The level of the outer most query block is considered as 
and all the query blocks nested under a level-i query block have the level i + 

2. Let the BindAttrs of a block bk with respect to a sort order s be defined as follows. 
BindAttrs(bk, s): Attributes in s that are bound by either block bk or by an ancestor 
ofh. 

Then, for each ancestor block bk of b at level k such that level(b) > k + 1 (i.e., 
excluding the parent block ofb), values of BindAttrs(bk, s) are distinct for any two 
invocations ofbk+i- 

The first condition in Definition 12.31 ensures that attributes bound by an outer query 
block always precede attributes bound by an inner query block in any valid sort order. 
The need for the second condition is best explained with an example. Consider a query 
with two levels of nesting; q a (Qb(<lc)), Qa being the outer-most query block and q c being the 
inner most. Suppose block q c makes use of two parameters: parameter a bound at q a and 
parameter b bound at g&. If q a generates duplicate values for a, then (a, b) is not a valid 
parameter sort order for q c . This is because if q a generates duplicate bindings for a, then 
even if the plan for q^ produces the bindings for b in sorted order, q c cannot see a sorted 
sequence on (a, b) across all its invocations; the bindings for attribute b will cycle back for 
the same value of attribute a. Now, if q a is guaranteed to not generate duplicates for a, 
then (a, b) is a valid parameter sort order for block q c . However the sort order (b) is not 
valid (unless q a invokes g& at most once) since even if the plan for block q b produces the 
bindings for b in sorted order, block q c will see a sorted sequence of b values for a single 
invocation from q a , but may not see a sorted sequence on b across multiple invocations 
from q a . 

Generating Plans for the Bind and Use Expressions of an Apply Operator 

Consider a query block q under which blocks qi, q 2 , ■ ■ ■ , q n are nested. This is represented 
by an Apply expression with q as the bind sub-expression and qi, q 2 , ■ ■ ■ , q n as use sub- 
expressions. Generating plans at the Apply node involves the following steps: 

1. For each use expression q iy identify the set of interesting parameter sort orders. 
Identifying a good set of interesting sort orders is a topic of interest not only for 
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nested queries but also for queries with joins, group-by and set operations. We 
address this problem in the next chapter. 

2. For the outer query block q, identify the set s q of sort orders that are available 
readily or at a low cost. 

3. Form the set Lords as s± U . . . U s n U s q . 

4. From the set Lords discard the sort orders that are not valid, under the given nesting 
structure of query blocks. The conditions for validity are specified in definition 12.31 

5. From the set Lords we derive a set Lords consisting of sort orders that are relevant to 
the bind expression q of the Apply operator. Note that the sort orders in Lords can 
contain some attributes that are bound higher up in the complete query structure. 
Deriving Lords from Lords involves extracting the suffix of each order ord G Lords 
such that the suffix contains only those parameters that are bound in q. 

6. For each sort order o G Lords U {e}, where e stands for the empty sort order, 
we generate plans for the bind expression by making o as the required physical 
property on the result (output), and then generate plans for all the use expressions. 
We create a physical operation node a for the Apply operation depending on the 
predicate associated with the Apply node. The plans generated for the bind and use 
expressions are added as the child plans for a. 



Figure 2.4: Sort Order Propagation for a Multi-Level Mult i- Branch Expression 

We now illustrate the working of the above steps using the example expression shown 
in Figure E31 Two sub-expressions e<i = {e<i\ Apply ei2) and are nested under the outer- 
most expression e\. In the figure, we indicate the parameters bound and parameters used 
by each expression with the convention B: and U: respectively. Consider the outer-most 
apply operator present at the root of the expression tree. In step 1, for each of the use 



e 




U:{a,b} 



U:{a,c} 
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sub-expressions i.e., for e 2 and we identify the set of interesting parameter sort orders. 
The interesting parameter sort orders of an expression depend upon the sort orders of base 
relations used in the expression and the correlation predicates. The details of deriving 
interesting orders is the topic of the next chapter. For now, suppose the sub-expression 
e 2 has two interesting sort orders (a) and (a, b) on the parameters it uses, and suppose 
the sub-expression has one interesting sort order (b). In step 2, we identify the set 
of sort orders available at low cost on the output of expression e\. Such sort orders are 
called favorable sort orders and the details of finding the favorable sort orders are given 
the next chapter. For this example, suppose there exists a single favorable sort order (a) 
for expression t\. In step 3, we compute the set Lords as {(a), (a,b), (b)}. In step 4, we 
check for the validity of these sort orders as per Definition 2.3. All the three sort orders 
are valid in this case. We then derive the set Lords in step 5, by extracting the sort order 
suffix relevant to the bind expression e\. e\ being the outer-most block, Lords will be same 
as Lords. In step 6, we generate plans for the use expression e\ with each of the three sort 
orders (a), (a, b) and (b) as the required output sort order, and also generate the plans 
for the use expressions with each of these sort orders as the guaranteed sort order on the 
parameter bindings. The corresponding plans of the bind and use expressions are then 
paired as child plans of a physical apply operator. Note how the set Lords is computed 
for the apply operator at the root of sub-expression e 2 = (e 2 i Apply e 22 ). The set Lords 
of interesting orders for e 22 has a single element (a, c), i.e., Lords={(a, c)}. From this set 
we derive the set Lords as {(c)} since c is the only parameter bound by the expression 
e 2 i, which is the bind expression for the Apply operator in consideration. 

Procedure ProcApplyNode in Figure 12.51 shows the plan generation steps at an Apply 
operator node. The top level procedure for generating the physical plan space is given in 
Figure I2.6[ and it makes use of the two procedures ProcLogicalOpNode and ProcApplyN- 
ode. For simplicity, we omit the cost based pruning from our description and return to it 
later. As a result the callCount parameter does not appear in the algorithm. Figures 12.71 
and 12.81 show the logical query DAG and the resulting physical query DAG (assuming a 
very limited collection of algorithms) for the example of Figure 12.21 

To check if a sort order is valid, we need a mapping from each parameter to the level 
number of the block in which the parameter is bound. In the logical query DAG (LQDAG), 
due to the sharing of common sub-expressions, the mapping of parameters to the level of 
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Procedure ProcApplyNode 

Inputs: o, logical operation node corresponding to the Apply operator in the LQDAG 

s, sort order guaranteed on the parameter bindings 

e, the physical equivalence node for the plans generated 
Output: Expanded physical plan space. New plans are created under e. 
BEGIN 

Form the set i_ords of valid interesting orders on parameters by considering all the input 
sub-expressions of o. 

From the set i_ords, create the set Lords by extracting sort order prefixes of attributes bound 
by o.bindlnput. 

For each order ord in Lords and the empty sort order e 
// Generate plans for the bind expression 
Let l eq = PhysDAGGen(o. bindlnput, ord, s) 
Let newParamOrd = concat(s, ord) 
Let iterOp = New iterator physical op for Apply 
iterOp. bindlnput = l eq 
For each use input u of o 

Let u eq — PhysDAGGen(u, e, newParamOrd) 
Add u eq as a use input of iterOp 
Add iterOp as a child of e 

END 

Figure 2.5: Plan Generation at an Apply Node 



Procedure PhysDAGGen 

Inputs: e, logical equivalence node for the expression 

p, physical property required on the output 

s, sort order guaranteed on the parameter bindings 
Output: Generates the physical plan space and returns the physical equivalence node 
BEGIN 

If a physical equivalence node n p exists for e, p, s in the memo structure 
return n p 

Create an equivalence node n p for e, p, s and add it to the memo structure 
For each child logical operation node o of e 
lf(o is an instance of ApplyOp) 
ProcApplyNode(o, s, n p ) 

else 

ProcLogicalOpNode(o, p, s, n p ) 
For each enforcer fthat generates property p 

Create an enforcer node oj under n p 

Set the input of of = PhysDAGGen(e, e, s) 
return n p 

END 

Figure 2.6: Main Algorithm for Physical Plan Space Generation 



the query block that binds it cannot be fixed statically for each logical equivalence node. 
In fact, a single logical equivalence node can get different level numbers because of the 
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Figure 2.7: LQDAG for the Example of Figure [2721 

Legend 

RS=Rel-Scan 
IDX=Index Lookup 
RTS=Restartable Scan 
PRJ=Project (No DupElim) 

l_shipdate 

S\ =Apply Physical Op 
pred 1 =l_orderkey=o_orderkey 
GSO^orderkej^^"^^" ~ RSOT|o_orderkey pred2= o_orderdate NOT IN 

)predl RSO=Required Sort Order 

[ORDERS] [LINEITEM] [IDX-OKEY] [LINEITEM] GSO=Guaranteed Sort Order 
(clustered) 

Figure 2.8: PQDAG for the Example of Figure O 
level altering transformations such as: 

(R Aexists(cs.c2=r.ci(S)Aexists(o't.c3=r.ciT))) 

((-R l>< -R.cl=S'.c2 S)AEXISTs(o'T.c3=R.clT)) 

In the LHS of the above equivalence rule relation T gets a level number two levels 
higher than R, whereas in the RHS T gets a level number one level higher than R. 
This happens because the RHS uses an outer join to remove the nesting of S within R. 
Figure 12.91 gives a pictorial illustration. 

In general with such transformations, a sub-expression can see a different mapping 
of parameters to levels depending on which expression is chosen above it in a logical plan. 
We can thus get multiple interesting parameter sort orders, corresponding to the different 
nesting structures. In our implementation, we address this issue by carrying along a map 
of parameters to levels when recursively traversing the LQDAG to find valid interesting 
sort orders. A node in the LQDAG could be traversed more than once, if there are 
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T 

Figure 2.9: An Example of Level Altering Transformation 

alternative nesting structures above. Interesting sort orders computed at a node can be 
memoized against the nesting structure to avoid repeated computation. 

2.3.5 Search for Best Plan and Cost-Based Pruning 

At the end of physical plan space generation we will have a physical query DAG with a 
root physical equivalence node. The best plan for the PQDAG is computed recursively by 
adding the cost of each physical operator to the cost of the best plans for its inputs and 
retaining the cheapest combination. 

While computing the plan cost we take into account the fact that the use sub- 
expressions of an Apply operator are evaluated as many times as the cardinality of the 
bind sub-expression of the Apply operator. If caching of sub-query results is employed, 
then the number of distinct correlation bindings will be used in place of cardinality. Each 
physical operator's cost function is enhanced to take an integer n as a parameter and 
return the cost for n invocations of the operator. Memoization of the best plan is done 
for each 4-tuple (expression, output physical properties, input parameter sort order, call 
count). This is required since the best plan may be different for different call counts. 

Optimization with different call counts can potentially increase the cost of optimiza- 
tion. However, if the plan is the same for two different call counts, we can assume that it 
would be the same for all intermediate call counts. The same plan can then be reused for 
all calls with an intermediate call count, with no further memoization required. Results 
from parametric query optimization [25] indicate that the number of different plans can 
be expected to be quite small. This helps in reducing both the number of plans memoized 
and the number of optimization calls. We apply all simple (non-nested) predicates before 
the nested predicate is applied. This further reduces the number of distinct call counts 
with which an expression is optimized. 
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Cost-Based Pruning 

In our description, we ignored cost-based pruning for simplicity and separated the physical 
DAG generation and the search phases. In our actual implementation, the generation 
of the physical plan space and search for the best plan take place in a single phase. 
While generating the physical plan space, the cost of each plan is calculated and the best 
plan seen so far is memoized. We perform cost-based pruning as in the original Volcano 
algorithm [23] . 

2.4 Experimental Results 

We implemented the state-retention techniques in PostgreSQL and carried out a perfor- 
mance study. The optimization techniques were implemented in our Volcano-style opti- 
mizer called PYRO, and these plans were forced on PostgreSQL bypassing it optimizer. 
We considered three algorithms: nested iteration(NI), magic decorrelation(MAG) [19] and 
nested iteration with state retention(NISR). In the case of nested iteration (NI) a suit- 
able index was assumed to be present and used. Whenever a relation was assumed to 
be sorted, the NI plan used a clustered index. Magic decorrelation [19] involves partially 
evaluating the outer query block so as to identify the full set of parameters with which 
the subquery is to be executed. The partial result of the outer query block is called a 
supplementary table. The correlated subquery is then rewritten as a non-nested query 
by using an appropriate type of join with the supplementary table. The rewritten query 
produces the sub-query results for the set of parameters from the supplementary table. 
A join of the rewritten sub-query and the supplementary table to evaluate the remaining 
outer predicates gives the final result. For a more detailed description of the magic decor- 
relation technique we refer the reader to [49J. In our experiments, the plans employing 
magic decorrelation were composed with the supplementary table materialized. 

PostgreSQL did not automatically decorrelate any of the queries we considered, and 
it always used a simple nested iteration plan. Hence, the results noted for the nested 
iteration (NI) algorithm also act as the baseline PostgreSQL measures. The plans employ- 
ing state-retention techniques and magic decorrelation were forced through code changes, 
bypassing the PostgreSQL's optimizer. 

For our experiments, we used the TPC-H [55] dataset on 1GB scale, and an additional 
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relation, dailysales which had 2,500 records. The experiments are described below. 



Experiment 1 

For this experiment, we used the query shown in Example 12.21 which is a minor variant of 
the query given in Example 11.11 of Chapter [TJ The query in Example 11.11 uses a NOT IN 
predicate whose decorrelated form requires an implementation of anti-join, which is not 
available in PostgreSQL. Hence, we changed the predicate to an IN predicate. 

Example 2.2 Query Used in Experiment 1 

SELECT o-orderkey, o_orderdate FROM ORDERS 
WHERE o-orderdate IN ( SELECT Lshipdate FROM LINEITEM 

WHERE Lorderkey = o_orderkey); 



(sees) 




Nl MAG NISR 

Algorithms Considered 

Figure 2.10: Performance Results for Experiment 1 



Figure 12.101 shows the execution times for this query. Magic decorrelation performs 
poorly because there are no outer predicates and no duplicates. This leads to a large 
redundant join in the plan produced by magic decorrelation. Indexed nested loops join 
performs significantly better but is still less efficient than nested iteration with state 
retention. This is due to the overhead of index lookups. This overhead is significant even 
though most of the index pages above the leaf level are cached in memory. 



Experiment 2 



For our second experiment we used the query shown in Example 12. II of Section T2. 2. 31 The 
query lists the days on which the sales exceeded the maximum daily sales seen in the past. 
Figure 12.111 shows the execution times for the plans. 
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Figure 2.11: Performance Results for Experiment 2 

For this query, nested iteration with state retention completely outperforms magic decor- 
relation and plain nested iteration. Due to the presence of a non-equality correlation 
predicate the cost of both magic decorrelation and plain nested iteration increase very 
rapidly with the increase in the number of outer block tuples. Nested iteration with 
state retention performs a single scan of the inner and the outer relations as described in 
Section I2T2T31 



Experiment 3 

For our third experiment we used a query which is a modified version of the TPC-H min 
cost supplier query shown in Example 12.31 below. 



Example 2.3 Query Used in Experiment 3 



SELECT ssname, s-acctbal, s^address, sjphone 
FROM PARTS, SUPPLIER, PARTSUPP 

WHERE sjnation= 'FRANCE' AND p_size=15 AND p_type=' BRASS' AND 
pjpartkey=psjpartkey AND ssuppkey=pssuppkey AND 
pssupplycost = 

( SELECT rain (PSl.ps supply cost) 
FROM PARTSUPP PS1, SUPPLIER SI 
WHERE pjpartkey—PSl.psjpartkey AND 

SI .ssuppkey=PSl .pssuppkey AND 
SI . S-nation= 'FRANCE '); 




The results are shown in figure l2TT2T a). Magic decorrelation performs the best be- 
cause of the low selectivity of the outer predicates. There were only 108 distinct tuples 
satisfying the outer predicates. Restart scan performs poorly in this case as the entire re- 

34 



lation is scanned where only small fraction of it was required. However, as the selectivity 
of the outer predicates increases, NISR becomes more attractive. This is evident from a 
second experiment that we carried out by dropping the predicate "p_size=15". As can be 
seen in Figure l2.12f b). NISR performs better in this case. 




Nl MAG NISR 

Algorithms Considered 

(a) With all outer predicates 



Nl MAG NISR 

Algorithms Considered 

(b) Dropping 'p_size=15' 



Figure 2.12: Performance Results for Experiment 3 



Example 2-4 Query Used in Experiment 4 

Find the turn around time for high priority orders. The turn around time of an order is 
calculated as the maximum of the differences between the ship date and placement date 
of all its line items, if the order price is < 2000 and it is calculated as the maximum of 
the differences between the commit date and placement date otherwise. 

SELECT O-orderkey, turn_around_time (o_orderkey, odotalprice, O-orderdate) 
FROM ORDERS WHERE 0-orderpnority='HIGH'; 

DEFINE turn_around_time (@orderkey, @totalprice, @orderdate) { 
IF (@totalpnce < 2000) 

SELECT max(Lshipdate - ©orderdate) FROM LIN EITEM 

WHERE Lorderkey= ©orderkey; 
ELSE 

SELECT max(Lcommitdate-@orderdate) FROM LINEITEM 
WHERE Lorderkey= ©orderkey; 

} 



Experiment 4 

The query used in our fourth experiment is shown in Example 12.41 For this query, we 
compare only Nl with NISR since decorrelation techniques are not directly applicable. 
The nested iteration plan employed a clustered index lookup on the lineitem table, where 
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as NISR employed two restartable scans. As can be seen in Figure I2.13[ NISR performs 
significantly better than NI; this is because the inner relation (lineitem) is scanned at 
most twice with NISR, whereas NI performs an indexed lookup of the inner relation for 
each tuple in the outer relation. 




NI NISR 
Algorithms Considered 

Figure 2.13: Performance Results for Experiment 4 



Optimization Overheads 

The optimization overheads due to the proposed extensions in the Volcano optimizer were 
negligible. We measured the optimization time for complex nested queries, with up to 10 
levels of nesting, and observed no measurable overheads. 

2.5 Related Work 

Since early '70s, there has been extensive work in query optimization. An excellent sur- 
vey of query optimization techniques is [5|. Nested queries in particular, have received 
significant interest; however, most of the emphasis so far has been on decorrelation tech- 
niques [3H [301 US El EH HE HZl [I2]- Decorrelation of queries on XML databases with 
XQuery as the query language is addressed in [36]. Decorrelation techniques try to rewrite 
a given nested query into a form that does not use the nested sub query construct. Decor- 
relation techniques allow an optimizer to consider alternative set oriented plans such as 
merge join or hash join for evaluating a nested query, and in most cases these methods 
perform better than the naive nested iteration method. The techniques we proposed to 
speed up nested iteration are orthogonal to decorrelation, and a cost-based optimizer 
should consider both decorrelated evaluation and the improved nested iteration methods 
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while choosing the best plan. In this chapter, we show several cases where the improved 
nested iteration methods can perform significantly better than the plans generated by 
decorrelation techniques. 

Techniques for improving the performance of nested iteration have been proposed by 
Selinger et.al. [48] and Graefe [22]. System R [48] uses the idea of caching the inner sub- 
query result for distinct values of correlation variables and sorting the outer tuples which 
allows caching of only one result of the inner query at any point in time. Graefe [22] em- 
phasizes the importance of nested iteration plans and discusses asynchronous 10, caching 
and sorting of outer tuples as techniques that can improve the performance of nested 
iteration. Asynchronous 10 as a means to improve iterative query execution plans is also 
considered by Elhemali et.al. [12] and Iyengar et.al. [26]. Rao and Ross [44] propose an 
approach of improving the efficiency of iterative plans by identifying and reusing invari- 
ants (sub-expressions within the nested sub-query, which are not correlated to the outer 
query). The invariants can be computed just once, in the first invocation, and reused 
for subsequent invocations of the nested sub-query. Akinde and Bohlen [1] argue that 
decorrelation techniques may not always produce the most efficient plans, especially for 
complex OLAP queries, and propose generalized multi-dimensional join (GDMJ) operator 
as an alternative means for efficient evaluation of several types of OLAP queries. 

The techniques we propose in this paper for improving the nested iteration method 
augment the techniques proposed in [48] and [22]. Sorting in System R is purely to ensure 
the cached result can be kept in memory (only one cached result needs be retained). 
Graefe [22] describes sorting of outer tuples to produce advantageous buffer effects in the 
inner query plan. Both [48] and [22] do not discuss the changes required in the optimizer 
to consider these options and generate an overall best plan. Database systems such 
as Microsoft SQL Server consider sorted correlation bindings and the expected number 
of times a query block is evaluated with the aim of efficiently caching the inner query 
results when duplicates are present, and to appropriately estimate the cost of nested query 
blocks [IB] . To the best of our knowledge, the state- retention techniques and optimization 
of multi-branch, multi-level correlated queries considering parameter sort orders have not 
been proposed or implemented earlier. 
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2.6 Summary 



In this chapter we revisited iterative execution plans for nested queries and showed how 
sort order of parameter bindings can be exploited through state retention to improve their 
execution time. For several queries, even when decorrelation is applicable, an iterative 
execution plan might be the most efficient of the available alternatives and hence the 
optimizer's search space should include these improved nested iteration plans and the op- 
timizer must estimate their cost appropriately. We showed how a Volcano style cost-based 
optimizer can be extended to take into account state retention of operators and effects of 
parameter sort orders. We presented a performance study based on our implementation 
of the proposed techniques and the results show significant benefits for several types of 
queries, with no noticeable overheads in the optimization time. 
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Chapter 3 

Sort Order Selection 



The previous chapter showed the use of sort orders (of sub-query parameters) in improving 
the efficiency of iterative execution plans. Sort orders, in general, play an important role 
in query evaluation. Algorithms that rely on sorted inputs are widely used to implement 
joins, grouping, duplicate elimination and other set operations. The notion of interesting 
orders [H] has allowed query optimizers to consider plans that could be locally sub- 
optimal, but produce ordered output beneficial for other operators, and thus produce 
a globally optimal plan. However, the number of interesting orders for most operators 
is factorial in the number of attributes involved. For example, all possible sort orders 
on the set of join attributes are of interest to a merge join. Considering the exhaustive 
set of interesting orders is prohibitively expensive as the input sub-expressions must be 
optimized for each interesting sort order and the corresponding plan must be memoized. 

Deciding a practical set of interesting sort orders is a crucial step in query optimiza- 
tion. The variation in plan cost due to the choice of different sort orders could be very 
high. A plan with carefully chosen sort orders, which exploit clustering/covering indices 
and commonalities between order requirements of multiple operators, can perform signifi- 
cantly better than a plan with naively chosen orders, due to the reduction in sorting cost. 
Sorting cost in a plan can also be reduced by exploiting partially available sort orders. For 
example, a primary index on a subset of attributes involved in the join predicate partially 
fulfills the sort order requirement of merge-join, and can greatly reduce the cost of the in- 
termediate sorting stage. The optimizer must therefore consider partial sort orders while 
choosing interesting orders. Sorting based algorithms for binary operators such as join 
and union, although agnostic to the exact sort order of their inputs, require a matching 
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order from the two inputs. These factors make the problem of choosing sort non-trivial. 

In this chapter we address the problem of choosing interesting sort orders. We use 
join expressions for our description. However, the solution can be used for choosing 
input sort orders for any sorting based operator as well as for choosing sort orders on 
sub-query/UDF parameters. 

The rest of this chapter is organized as follows. Section [37TI describes how partial sort 
orders can be exploited during sorting and proposes extensions to a cost-based optimizer 
to account for their benefits during plan generation. In Section [3T21 we show that a special 
case of the problem of selecting globally optimal sort orders is NP-hard and give a 1/2- 
benefit approximation algorithm to handle the case. Although the problem is intractable, 
the knowledge of available indices and sort order propagation properties of physical op- 
erators allows us to provide a good heuristic approach, which we describe in Section 13.31 
Section 13.41 shows how the solution can be used to obtain interesting parameter sort or- 
ders for nested sub-expressions. We present our experimental results in Section 13.51 We 
discuss related work in Section 13.61 and summarize the work in Section 13.71 



Often, sort order requirements of operators are partially satisfied by indices or other oper- 
ators in the input subexpressions. A prior knowledge of partial sort orders available from 
inputs allows us to efficiently produce the required (complete) sort order more efficiently 
When operators have flexible order requirements, it is thus important to choose a sort 
order that makes maximum use of partial sort orders already available. We motivate the 
problem with an example. Consider the query shown in Example 13.11 Such queries fre- 
quently arise in consolidating data from multiple sources, e.g., in extract-transform-load 
(ETL) tasks. The join predicate between the two catalog tables involves four attributes 
and two of these attributes are also involved in another join with the rating table. Further, 
the order-by clause asks for sorting on a large number of columns including the columns 
involved in the join predicate. 

The two catalog tables contain 2 million records each, and have average tuple sizes 
of 100 and 80. We assume a disk block size of 4K bytes and 10000 blocks (40 MB) of 
main memory for sorting. The table catalog! is clustered on year and the table catalog2 



3.1 Exploiting Partial Sort Orders 
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Example 3.1 A Query with Many Join and Grouping Attributes 



SELECT el. make, el. year, cl.city, cl. color, cl.sellreason, c2. breakdowns, r. rating 
FROM catalogl cl, catalog2 c2, rating r 

WHERE c 1. city =c2. city AND cl .make=c2.make AND cl.year=c2.year AND 

cl.color=c2. color AND cl.make=r.make and cl. y ear =r. year 
ORDER BY cl. make, cl.year, cl. color, cl.city, cl.sellreason, c2. breakdowns, r. rating; 



sort-4 (~)PlanCost=530,345 

K ' 2M X 150 bytes 

(100) D>i<3 

2MX 120 bytes^ 




sort- 
(100) 
2MX 1 00 bytes 

catalogl 
C.Idx Scan(50K) 



2M X S<p bytes 2K X 40 bytes 
rating 

Table scan(40K) Table scan (40) 



sort-1: (y) — > (y, m, c, co) 

sort-2: (m) — > (y, m, c, co) 

sort-3: ( ) — > (y, m) 

sort-4: (y, m, c, co) — > (m, y, co, c,...) 



40Plan Cost=290,410 

(100)|^<] 



sort 
(100) 




catalogl catalog2 rating 

Table scan (50K) C.Idx Scan (40K) Cov. Idx Scan (10) 

sort-1: (y) — > (m, y, co, c) 

sort-2: (m) — > (m, y, co, c) 

sort-3: (m) — > (m, y) 

sort-4: (m, y, co, c) — >(m, y, co, c, ...) 



Figure 3.1: A Naive Merge- Join Plan 



Figure 3.2: Optimal Merge- Join Plan 



is clustered on make. The rating table has a secondary index on the make column, and 
the index includes the year and rating columns in its leaf pages {i.e., the index covers the 
query) . Figures 13.11 and 13.21 show two different plans for the example query. Numbers in 
the parentheses indicate estimated cost of the operators in number of I/Os (CPU cost is 
appropriately translated into I/O cost units). Edges are marked with the number of tuples 
expected to flow on that edge and their average size. For brevity, the input and output 
orders for the sort enforcers are shown using only the starting letters of the column names. 
Though both plans use the same join order and employ sort-merge joins, the second plan 
is expected to perform significantly better than the first. 

3.1.1 Changes to External Sort 

External sorting algorithms have been studied extensively but in isolation. The standard 
replacement selection [32] for run formation well adapts with the extent to which input 
is presorted. In the extreme case, when the input is fully sorted, it generates a single 
run on the disk and avoids merging altogether. Larson [33] revisits run formation in 
the context of query processing and extends the standard replacement selection to handle 
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variable length keys and to improve locality of reference. Estivill-Castro and Wood [T3] 
provide a survey of adaptive sorting algorithms. The technique we propose in this section 
to exploit partial sort orders is a specific optimization in the context of multi-key external 
sorting. We observe that, by exploiting prior knowledge of partial sort order of input, it 
is possible to eliminate disk I/O altogether and have a completely pipelined execution of 
the sort operator. 

We use the following conventions: o, 0\, 02 etc. refer to sort orders. A sort order of size n is 
a sequence of attributes/columns (ai,a 2 , . . . ,a„). Sort direction (ascending/descending) 
is ignored; our techniques are applicable independent of the sort direction. 

• e : Empty (no) sort order 

• attrs(o) : The set of attributes in sort order o 

• |o| : Number of attributes in the sort order o 

• 0\ < 02 : Order 02 subsumes order 0\ (o\ is a prefix of 02) 

• o\ < 02 : Order 0\ is a strict prefix of 02 

Consider a case where the sort order to produce is (coh, C0I2) and the input already 
has the order (co/i). Standard replacement-selection writes a single run with all the tuples 
to the disk and reads it back again; this breaks the pipeline and incurs substantial I/O 
for large inputs. It is not difficult to see how the standard replacement selection can be 
modified to exploit the partial sort orders. Let o = (01,02, .. . , a n ) be the desired sort 
order and d = (a 1; a2, . . . , Ofc), k < n be the partial sort order known to hold on the 
input. At any point during sorting we need to retain only those tuples that have the 
same value for attributes ai, 02, ■ ■ ■ , When a tuple with a new value for these set of 
attributes is read, all the tuples in the heap (or on disk if there are large number of tuples 
matching a given value of a%, 02, ... , a^) can be sent to the next operator in sorted order. 
Thus in most cases, partial sort orders allow a completely pipelined execution of the sort. 
Exploiting partial sort orders in this way has several benefits: 

1. Let o = (ai, 02, . . . , a n ) be the desired sort order and d = (01, 02, ■ ■ ■ , a&), k < n be 
the partial sort order known to already hold on the input. We call the set of tuples 
that have the same value for attributes (ai, 02, . . . , a^) as a partial sort segment. If 
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each partial sort segment fits in memory (which is quite often the case in practice), 
the entire sort operation can be completed without any disk I/O. 

2. Exploiting partial sort orders allows us to output tuples early (as soon as a new 
segment starts). In a pipelined execution this can have large benefits. Moreover, 
producing tuples early has immense benefits for Top-K queries and situations where 
the user retrieves only some result tuples. 

3. Since sorting of each partial sort segment is done independently, the number of 
comparisons are significantly reduced. Note that we empty the heap every time 
a new segment starts and hence insertions into heap will be faster. In general, 
independently sorting k segments each of size n/k elements, has the complexity 
0(n log(n/k)) as against 0(n login)) for sorting all n elements. Further, while 
sorting each partial sort segment comparisons need to be done on fewer attributes, 
(a k+ i, . . . , a n ) in the above case. 

Our experimental study presented in Section 13.51 confirms that the benefits of ex- 
ploiting partial sort orders can be substantial, and yet none of the systems we evaluated 
exploited the partial sort orders. 

3.1.2 Optimizer Extensions for Partial Sort Orders 

In this section we assume operators have fixed sort order requirements, and we focus only 
on incorporating partial sort orders. We deal with flexible sort order requirements of 
operators in subsequent sections. 
We use the following notations: 

• Oi A o 2 : Longest common prefix between sort orders Oi and 02 

• o A s : Longest prefix of sort order o such that each attribute in the prefix belongs 
to the attribute set s 

• o\ + 02 : Sort order obtained by concatenating 0\ and 02 

• Oi — o 2 : Sort order d such that o 2 + d = 0\ (defined only when o 2 < 0\) 

• coe(e, 01,02) '■ The cost of enforcing order 02 on the result of expression e which 
already has order o\ 

• N(e) : Expected size, in number of tuples, of the result of expression e 

43 



• B(e) : Expected size, in number of blocks, of the result of expression e 

• D(e,s) : Distinct values for attribute(s) s of expression e. D(e,s) = N(U s (e)) 

• cpu-Cost(e, o) : CPU cost of sorting the result of e to get order o 

• M : Number of memory blocks available for sorting 

The Volcano optimizer framework [23J assumes that an algorithm (physical operator) 
either guarantees a required sort order fully or it does not. Further, a physical property 
enforcer (such as sort) only knows the property to be enforced and has no information 
about the properties that hold on its input. The optimizer's cost estimate for the enforcer 
thus depends only on the required output property (sort order). In order to remedy these 
deficiencies we extended the optimizer in the following way: consider an optimization 
goal (e, o), where e is the expression and o the required output sort order. If the physical 
operator being considered for the logical operator at the root of e guarantees a sort order 
d < o, then the optimizer adds a partial sort enforcer enf to enforce o from o'. We use 
the following cost model to account for the benefits of partial sorting. 



coe(e, e, o) = < 



cpu-cost(e,o) if B(e) < M 

B(e)(2\log M -i(B(e)/M)] +1) otherwise 



If e is known to have the order oj, we estimate the cost of obtaining an order o 2 as follows: 
coe(e, 01, o 2 ) = D(e, attrs(o s ))*coe(e', e, o r ), where o s = o 2 Aoi, o r = o 2 — o s and e' = o~ p (e), 
where p equates attributes in o s to an arbitrary constant. Intuitively, we consider the cost 
of sorting a single partial sort segment independently and multiply it by the number of 
segments. Note that we assume uniform distribution of values for attrs(o s ). Therefore, 
we estimate JV(e') = \N(e)/D(e, attrs(o s ))] and B(e') = \B(e)/D(e, attrs(o s ))]. When 
the actual distribution of values is available, a more accurate cost model that does not 
rely on the uniform distribution assumption can be used. 



3.2 Choosing Sort Orders for a Join Tree 

Consider a join expression e = e\ M e 2 , where ei,e 2 are input subexpressions and the 
join predicate is of the form: (ei.ai = e 2 .ai and ei.a 2 = e 2 .a 2 . . . and e\.a n = e 2 .a n ). 
Note that, w.l.g., we use the same name for attributes being compared from either side 
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and we call the set {ai, ... , a n } as the join attribute set. In this case, the merge join 
algorithm has potentially n\ interesting sort orders on inputs t\ and 62^. The specific 
sort order chosen for the merge-join can have significant influence on the plan cost due 
to the following reasons: {%) Clustering and covering indices, indexed materialized views 
and other operators in the subexpressions ei,e2 can make one sort order much cheaper 
to produce than another, iii) The merge-join produces the same order on its output as 
the one selected for its inputs. Hence, a sort order that helps another operator above the 
merge-join can help eliminate a sort or just have a partial sort. In this section we show 
that a special case of the the problem of choosing optimal sort orders for a tree of merge- 
joins is NP-Hard and provide a 1/2 benefit approximation algorithm for the problem_|. In 
the next section, we describe our heuristics for a more general setting of the problem in 
which we make use of the proposed 1/2 benefit approximation algorithm. 



3.2.1 Finding Optimal is NP-Hard 

We now show that the problem of choosing optimal sort orders is NP-Hard by considering 
a special case of the problem. Let e = R\ M R 2 M R3 ■ ■ ■ x R n be a join expression 
with conjunctive join predicates on n relations, where n is a power of 2. Let T be a 
balanced join order tree for e. Figure 13.31 shows an example. For each join node v 
in T, we assign an attribute set L v (called representative join attribute set), which is 
constructed as follows. If is an attribute involved in the join predicate of v then 
TC(a,i) G L v , where Ti{ai) is the representative of the attribute equivalence class in the 
result of e. Two attributes ai and aj belong to the same attribute equivalence class if 
they are equated directly or transitively in the join predicates of e. The representative of 
an equivalence class is an arbitrarily chosen attribute belonging to the equivalence class. 
For example, if the predicate R\.a\ = R<i-Oi A Ri.a-i = Rz-d?, is part of the join predicates 
of e, then R\.a\,Ri.a,2 and R^.a?, belong to the same attribute equivalence class, and we 
will have H(Ri.ai) = Ti(R2.a 2 ) = 7i(i? 2 .02) = R\-CLi- In Figure I3T31 we have shown the 
representative join attribute sets for each join node. For brevity, we omit the relation 

1 We assume merge-join requires sorting on all attributes involved in the join predicate. We do not 

consider orders on subsets of join attributes since the additional cost incurred at merge-join matches the 

benefit of sorting a smaller subset of attributes. 

2 The work was carried out in collaboration with Ajit A. Diwan and Ch. Sobhan Babu. 
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name qualifiers for attributes. 



R1.a=R4.a A R2.d=R3.d 




R1 R2 R3 R4 



Figure 3.3: A Join Tree with Representative Join Attribute Sets 

Now, suppose all the base relations and intermediate results in T are of the same size 
and no indices are present on the base relations. The problem of choosing optimal sort 
orders for each join requires us to choose permutations of representative join attribute 
sets such that we minimize the cost of intermediate sorts. The cost of sorting depends 
on the sort order already present on the input and the sort order required. In general, 
the sort cost on any edge (vi,Vj) of the tree is a monotonically decreasing function of 
the length of common prefix between attribute permutations chosen for Vi and Vj. For 
example, see our cost model for sort presented in Section I3.1.21 We define the benefit of 
a solution to be ^2 v . Vj<EE f(\Pi ^Pj\), where E is the set of edges in the tree, Pi,Pj are 
attribute permutations chosen by the solution for vertices Vi, Vj and / is any monotonically 
increasing function in the length of the common prefix (\pi A pj\), with at origin (i.e., 
f(0) = 0). Minimizing the sorting cost requires maximizing the total benefit. 

Figure I3.4[ shows an example along with a solution, which maximizes the total 
benefit assuming f(\pi Apj\) = \pi A pj\. The representative join attribute set for each 
join node is shown in curly braces besides the node. Permutations chosen by the solution 
are indicated with angle brackets and the number on each edge shows the benefit for that 
edge. Below we state the problem formally. 

Problem 1 (Common Prefix Problem) Let T be a tree having n vertices, the vertex set 
being V(T) and the edge set being E(T). Each vertex Vi (i = 1, . . . ,n) is associated with 
an attribute set S{. Let f be any non- decreasing function with f(0) = 0. Find a set of 
attribute permutations Pi,p% ■ ■ ■ ,p n > where pi is a permutation of set Si, such that the 
benefit function T = ^2y ViVje E(r) f(\Pi A Pj\) ^ s 'maximized. 
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Figure 3.4: A Special Case of Choosing Globally Optimal Sort Order 

We prove that Problem [1] is NP-Hard even for binary trees. To do so we consider the 
special case where f(\pi A pj\) = \pi Apj\. 

Before giving a proof for this theorem, we will study some well known NP-Hard problems, 
which are reducible to the Common Prefix Problem. 

Problem 2 (Sum-Cut) 177] / Given a graph G with m vertices, number the vertices of G 
as 1, . . . , m such that Yli<i<m c * * s minimized, where q is the number of vertices numbered 
< i that are adjacent to at least one vertex numbered greater than i. 

The Sum-Cut problem can be rephrased as follows: given a graph G with m vertices, 
number the vertices of G as 1, . . . , m such that J2i<i< m ^ * s maximized, where q is the 
number of vertices numbered < i that are adjacent to no vertex numbered greater than i. 
Let G' be the complement graph of G. The complement graph G' of G contains an edge 
(vi,Vj) iff(vi,Vj) is not present in G. On the complement graph G', it is straight-forward 
to see that the Sum-Cut problem is equivalent to Problem [3] given below. 

Problem 3 (Mod-Sum- Cut) Given a graph H with m vertices, number the vertices of H 
as 1,2, ... ,m such that J2i<i<m Q.i * s maximized, where g« is the number of vertices that 
are adjacent to all the vertices numbered greater than i. 

First, we reduce the Mod- Sum- Cut problem to the Common Prefix Problem on star 
trees. A star tree or simply a star of n vertices is a tree with a root and n — 1 leaf vertices. 

Lemma 3.1 The Common Prefix Problem is NP-Hard for stars. 
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Proof: We reduce the Mod-Sum-Cut problem to the Common Prefix Problem on stars, 
with the function / set to \pi A pj\ (i.e., the length of the longest common prefix). Let 
graph G with m vertices be the given instance of Mod-Sum-Cut problem. Let vi, . . . ,v m 
be the vertices of G. We construct an instance of the Common Prefix Problem on stars 
as follows: let S be a star having m + 1 vertices, with its root and U\, . . . , u m as 

its leaves. The attribute set of root u r is chosen to be the set of all vertices in G (i.e., 
{v i, . . . , v m }), and the attribute set of each leaf U{ is chosen to be adj(vi), where adj(vi) is 
the set of all vertices adjacent to Vi in graph G. A pictorial illustration of the construction 
is shown in Figure 13.51 




u ! : adj (vj) u 2 : adj (v 2 ) adj (v^ 

Figure 3.5: Reducing Mod-Sum-Cut to Common Prefix on Star 

Now, we show that there exists a solution of value k for Mod-Sum-Cut on G iff there 
exists a solution of value k for the Common Prefix Problem on S. 

Suppose there exists a solution of value k for Mod-Sum-Cut on G. Let the order 
of vertices in the solution be v g m, u fl (2), • • • , v g(m), where g is a permutation on 1, . . . , m 
(i.e., a one-to-one function from {1, . . . , m} to {1, . . . , m}). We construct the solution for 
the corresponding Common Prefix Problem (for star S) as follows: for the root vertex 
u r , we choose the attribute permutation to be o r = f 9 ( m ), v g ( m -i), ■ ■ ■ , v g(i)- For each leaf 
vertex Ui, we choose a permutation of its attribute set adj(vi) such that the length of 
the longest common prefix | Oj A o r \ is maximum. 

In the solution ordering for Mod-Sum-Cut, let Zj be the smallest integer such that 
Vi is adjacent to all vertices in the set i^+i), . . . , v g ( m ). This implies the following: (i) 
in the solution value for Mod-Sum-Cut, vertex Vi will be counted m — U times, i.e., 
k = ^2 v .(m — li), and (ii) in the corresponding Common Prefix Problem, there exists 
a common prefix of length m — li between the permutations chosen for U{ and the root 
u r . This shows there exists a solution of value k for the corresponding Common Prefix 
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Problem. 

Now, suppose there exists a solution of value k for the Common Prefix Problem 
on S. In the solution, let the attribute permutation chosen for the root vertex u r be 
o r = v h{i)i v h(2), ■ ■ ■ , Vh{m), where h is a permutation on 1, . . . , m. Now, we construct the 
solution for Mod- Sum- Cut on G by reversing the order of attributes in o r , i.e., by ordering 
the vertices of G as v h ( m ) , Uft(m-i) , • • • , v h (i)- 

In the solution for the Common Prefix Problem on S, let o, be the permutation 
(of set adj(vi)) chosen for leaf Uj. Let l{ denote the length of the longest common prefix 
between 0{ and o r , i.e., U = |oj Ao r \. We observe that, the solution value k = Xa<i<m(^)- 
In the corresponding solution for Mod-Sum-Cut on G, li will be the smallest integer such 
that vertex Vi is adjacent to all vertices in the set Vh( m ), ■ ■ ■ ,Vh(m-ii+i)- Hence, in the 
solution value for Mod-Sum-Cut on G, vertex t>j will be counted U times. Therefore, the 
solution for Mod-Sum-Cut will have a value of ^2 1<i<m (h) — k. □ 

Theorem 3.2 Problem^ is NP-Hard even for binary trees. 

Proof: We reduce the Common Prefix Problem on stars to the Common Prefix Prob- 
lem on binary trees. Let S be a star with u r as its root and u\, . . . ,u m as its leaves. 
Let a r denote the set of attributes associated with u r and 01,02, . . • , a m denote the set 
of attributes associated with vertices Ui,U2, ...,u m respectively. We now construct an 
instance of the Common Prefix Problem on binary trees as follows: let T be a binary 
tree with 2m vertices, with r%, r 2 , . . . , r m as its internal vertices and W\,W2, ■ ■ ■ , w m as its 
leaves. Let the edge set E(T) be {r^r^i : 1 < i < m} U {riWi : 1 < % < m}. Each internal 
vertex r*j is assigned the attribute set A = a r U C, where L is an arbitrarily chosen set of 
attributes of size > m x \a r \ and is disjoint from a r U ai U . . . U a m . Each leaf vertex u>j 
is assigned the attribute set a«. Figure I3T61 pictorially illustrates the construction. In the 
figure, the attribute sets a r and a\, . . . ,a m for the star are assumed to be as in Figure 13751 
Let Z = (m- 1) x |^4|. 

First, we show that if there exists a solution of value k for the Common Prefix 
Problem on S then there exists a solution of value k + Z for the Common Prefix Problem 
on T. 

Suppose there exists a solution of value k for the Common Prefix Problem on S. Let 
the o r be the attribute permutation assigned for u r and 0{ be the attribute permutation 
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Figure 3.6: Reducing the Common Prefix Problem on Stars to the Common Prefix Prob- 
lem on Binary Trees 

assigned for each Ui, 1 < i < m. We construct a solution for T as follows: for each 
intern vertex r i; we assign the permutation o r + oi, where o; is a fixed permutation of C, 
chosen arbitrarily. For each leaf vertex Wi, we assign the permutation Oj. Since the same 
permutation is chosen for all the internal vertices, each of the (m — 1) pairs of adjacent 
internal vertices will have a common prefix of length \A\. Further, each of the m pairs 
of internal and leaf vertices that are adjacent to each other will have a common prefix of 
length \oi A o r \. As k = Xa<i<m(l°* A °r|) s we conclude the solution value for T is k + Z. 

Next, we show that if there exists a solution of value k for the Common Prefix 
Problem on T then there exists a solution of value k — Z for the Common Prefix Problem 
on S. To do so, we make use two supporting lemmas, Lemma T3. 31 and Lemma I3~^E1 Below 
we state and prove these lemmas and then continue with the proof of Theorem 13.21 

Lemma 3.3 In any optimal solution for the Common Prefix Problem on T , all the in- 
ternal vertices are assigned an identical permutation p. 

Proof: Let T opt be an optimal solution for T. In the optimal solution, let pi,...,p m 
be the permutations assigned to internal vertices r 1; . . . , r m respectively. We prove that 
\pi A Pi+i\ = \A\ for 1 < i < m, which essentially proves this lemma. 
Case 1: Suppose \pi Api+i\ < \a r \ for some i, 1 < i < m. 

This implies, the total benefit of T opt , Ben(T opt ) < (m — 2) x |^4] + (m + 1) x \a r \. Since 
\C\ > m x \a r \, we have |^4| > (m + 1) x \a r \. Therefore, Ben(T opt ) < (m — 1) x |^4|. 
However, we know that there exists a solution for T with benefit of at least (to — 1) x |^4|. 
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This is because each of the internal vertices have the same attribute set of size |^4|. This 
contradicts the given fact that T opt is optimal. Therefore, we conclude that our assump- 
tion: \pi A Pi+i | < \a r \ for some i, 1 < i < m, cannot be true. 

Case 2: Suppose \p { A p i+ i\ > \a r \ for all 1 < i < m, but \p+ Apj+i| < |^4| for some i, 
1 < % < m. 

Given a permutation p, we use the notation p[j] to denote the attribute at the j th position, 
where 1 < j < \p\. The condition for Case 2 implies the following: 

(a) Pi[j] = p 2 [j] = ■■■= p m [j] for 1 < j < \a r \. 

(Jo) in T opt , the total benefit of edges incident between internal vertices J2i<i< m (\Pi ^Pi+±\) 
must be less than \A\ x (m — 1). 

Now, consider a solution T' opt for T in which each leaf vertex is assigned the same 
permutation as in T opt and all the internal vertices are assigned an identical permutation 
p constructed as follows: the first \a r \ attributes of p are chosen in the same order as the 
first \a r \ attributes in pi for any 1 < i < m (i.e., p\j] — Pi\j], 1 < j < \a r \), and the next 
— a r \ attributes are chosen an in an arbitrary order. 

We observe that in both T' opt and T opt the total benefit of edges incident from internal 
vertices to leaf vertices remains the same. However, in T' optl the total benefit of edges 
incident between internal vertices will be |^4| x (m— 1) (this is because all internal vertices 
are assigned an identical permutation). This implies, the total benefit of T' t is larger 
than that of T opt , which contradicts the given fact that T opt is optimal. Therefore, we 
conclude the assumption made for Case 2 cannot be true. 

We thus conclude in every optimal solution T opt , all the internal vertices are assigned 
an identical permutation, completing the proof of Lemma 13.31 □ 
Next, we state and prove our second supporting lemma. 

Lemma 3.4 There exists an optimal solution for T such that, in the permutation p chosen 
for the internal vertices, every attribute in set a r occurs before any attribute in set C 
occurs. 

Proof: Let T opt be an optimal solution for T. In T opt , let pi, . . . ,p m be the permutations 
assigned to the internal vertices r 1; . . . , r m respectively. From Lemma 13.31 we know that 
all the internal vertices are assigned an identical permutation; let pi — pi — . . . — p m — p. 
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Suppose there exist x, y such that x < y, p[x] G £ and p[y] G a r . We now modify 
Px, . . . ,p m as follows: in each p i; we swap Pi[x] with Since there is no attribute com- 
mon to the set £ and the attribute sets associated with the leaf vertices, this modification 
cannot decrease the total benefit of T opt . This modification can be repeated until all the 
attributes in a r appear before the attributes in £ in the permutation p. □ 

From Lemmas 13.31 and 13.41 we can make the following statement: if there exists a 
solution of value k for the Common Prefix Problem on T, then there exists a solution T opt 
of value at least k, in which, all internal vertices are assigned an identical permutation p 
and \p A a r \ = \a r \. 

We now construct a solution for the star S as follows: for the root vertex u r we 
assign the permutation p A a r . For each leaf vertex Ui, we assign the permutation chosen 
for the corresponding leaf Wi in the solution T opt . In T opt , the maximum benefit which 
can be contributed by edges incident between internal vertices of T is Z. Therefore we 
conclude the corresponding solution on 5* should have a benefit of at least k — Z. □ 

3.2.2 A Polynomial Time Algorithm for Paths 

We now present an efficient algorithm for solving the Common Prefix Problem, when the 
tree is a path. The algorithm employs dynamic programming. Note that left-deep and 
right-deep join plans result in problem instances on paths. 

Theorem 3.5 Let v i, v 2 , . . . , v n be a path, where each vertex Vi is associated with an 
attribute set s^. The optimal solution of Common Prefix Problem for any segment of 
the path, OPT(i,j) = max{ OPT(i, k) + OPT{k + 1, j) + f(c{i, j)) } over all i < k < j, where 
c(i,j) is the number of common attributes for the segment 

Proof: 

Case 1: Let c(i, j) = 0, i. e., there exists no attribute common to all vertices v^, f j+i, . . . , Vj. 
Consider an optimal solution for the path fj, . . . ,Vj. Let p x be the attribute permuta- 
tion assigned by the optimal solution to vertex v x , i < x < j. The optimal solution 
must contain two vertices Vk,Vk+i such that the benefit for the edge (vk,Vk+i) is 0, i.e., 
\Pk A Pk+i\ — 0. This directly follows from the assumption of Case 1, c(i,j) = 0. Now, 
the problem can be independently solved for the two segments (vi,Vk) and (vk+i,Vj) and 
OPT(i,j) = OPT(i, k) + OPT(k + 
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Case 2: Let c(i,j) 7^ 0. Let s(i,j) be the set of attributes common to all vertices 
Vi,...,Vj. Note that the cardinality |s(i,j)| = c(i,j). Let o s be an arbitrarily chosen 
permutation of set s(i,j). We claim that there exists an optimal solution such that, for 
every vertex v x (i < x < j) the attribute permutation p x chosen by the optimal solution 
has o s as its prefix. To see this, consider an optimal solution in which o s is not a prefix 
of some p x . We can then reorder the permutations assigned to the vertices, without a 
decrease in the total benefit OPT(i,j), so as to have o s as the prefix of each p x , i < x < j. 

Let v' { , . . . ,Vj be a path where each vertex v' x is associated with the attribute set 
s x — i.e., v' x has all the attributes of v x except those in s(i,j). We can see that the 

value of the optimal solution OPTiy^Vj) is given by Equation 13. 11 

OPT{v u v 3 ) = OPT^) + (j - i) x f(c(ij)) (3.1) 

Now, consider the path v' { , . . . , v'y From our construction of the path, we know there are 
no attributes common to all the vertices of v' { , . . . , v'y Therefore, we have: 

OPTiyl v'j) = OPT{v[, k') + OPT(v' k+1 , Vj) for some k', < k' < n. (3.2) 

From the construction of the path v' i ,...,v r j, we have: 

OPT(v' t ,v' k ) = OPT(v t ,v k ) - (k — i) x f(c(i,j)) (3.3) 

OPT« +1 ,t/.) = OPT(v k+1 , Vj ) - (j - k - 1) x f(c(ij)) (3.4) 
Substituting from Equations 13.31 and 13.41 in Equation I3.2[ we get: 

OPT(vlVj) = OPT(v u v k )+OPT(v k+1 , Vj )-(]-i-l)xf(c(i,])) for some k', < k' < n. 

(3.5) 

Substituting Equation 13.51 in Equation 13.11 we get: 

OPT{vi,Vj) = OPT(vi,v k ) +OPT(v k+1 , Vj ) +f(c(i,j)) for some k',0 < k' < n. (3.6) 
Hence the proof. □ 

Procedure PathOrder in Figure [3T7I computes optimal attribute permutations for any 
path (l,n), where each vertex i, 1 < % < n, is associated with an attribute set s[i}. The 
procedure uses dynamic programming and computes solutions bottom up starting from 
path segments of length (single vertices). The procedure begins by assigning a benefit of 
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to all path segments (i, i), 1 < i < n. It then constructs solutions for paths of increasing 
length. For each path (i, i+j), of length j, 1 < j < n — 1, the value of k, which maximizes 
ben=benefit(i, k) + benefit(k + 1, i+j) is identified. The values of k, ben and the attributes 
common to all the vertices of path (i, i+j) are remembered (memoized). Finally, the sub- 
procedure MakePermutation is used to construct the attribute permutations p[l], . . . ,p[n] 
using the memo structure. The first call to procedure MakePermutation is made with 
parameter i set to 1 (the first vertex on the path) and j set to n (the last vertex on 
the path), and each of the attribute permutations p[l], ■ ■ ■ ,p[n] initialized with an empty 
sort order. Procedure MakePermutation constructs the attribute permutation for each 
of the vertices i, i + 1, . . . ,j as follows: a permutation cp of commons(i, j) [i.e., the set 
of attributes common to all the vertices from i, . . . ,j) is chosen at arbitrary, cp is then 
appended to each of p[i], . . . ,p\j]. The common attributes for segment (i,j) are then 
removed from the common attributes of all subpaths of (i,j). The optimal split point m 
for the path segment (i,j) is read from the memo structure, and the construction of the 
permutations continues recursively on subpaths (i,m) and (m + l,j), until i — j (i.e., a 
single vertex). The overall time complexity of procedure PathOrder is 0(n 3 ). 

3.2.3 A 1/2 Benefit Approximation Algorithm for Binary Trees 

For binary trees we propose an approximation with benefit at least half that of an optimal 
solution. Note that our approximation guarantee implies at least half the best possible im- 
provement over the worst case sort cost. This however, does not imply a 2-approximation 
on the total cost. 

We split the tree into two sets of paths, P Q and P e . P Q has the paths formed by edges 
incident from odd levels and P e has those formed by edges incident from even levels, 
Figure 13.81 shows an example. We then find an optimal solutions for each of the two sets 
of paths. Note that this gives us two solutions for the complete tree, because each set of 
paths covers all the vertices of the tree (for any left over vertices at the leaf level or the 
root, we choose an arbitrary permutation). Let the optimal solutions for the two sets of 
paths be S Q and S e and the corresponding benefits be ben(S Q ) and ben(S e ). Let the set of 
edges included in P and P e be denoted by E a and E e respectively. Consider an optimal 
solution St for the whole tree. In the optimal solution, let the sum of benefits of all edges 
in E Q be odd-ben(ST) and that of edges in E e be even-ben(ST)- Note that ben(S D ) > 
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Procedure PathOrder 

Input: s[n] : array of attribute sets 

Output: p[n] : array of permutations or sort orders 

Data Structures: 

benefit[n][n], splitfn] [n] : arrays of integers 

commons[n][n] : array of attribute sets 

apermute(s) : Function that returns an arbitrary permutation of attribute set s 
BEGIN 

for i=l to n 

benefitfi] [i] = 0; p[i] = e; commons[i][i] = s[i]; split[i][i] = -1; 
for j=l to n-1 / / Consider path segments of length j 

for i = 1 to n-j // Consider path segment (i, i+j) 

Let k be the index such that i < k < (i+j) and benefit[i][k]+benefit[k+l][i+j] 

is maximum 

commons[i][i+j] = commons[i][k] n commons[k+l][i+j]; 

benefit[i][i+j] = benefit[i][k] + benefit[k+l][i+j] + /(|commons[i][i+j]|); 

split[i][i+j] = k; 

Call MakePermutation(l, n); // Form the attribute permutations 

END PROC 

// Procedure to construct attribute permutations from the memo structure, in which 
// the optimal split point and common attributes are remembered. 

Procedure MakePermutation(i, j) 

BEGIN 

Let ca = commons[i][j]; // Attributes common to all the vertices from i to j 
Let cp = apermute(ca); //An arbitrarily chosen permutation of ca. 
for k=i to j 

Append cp to p[k]; 
if(i=j) 

return; 

// Remove the common attributes from all subpaths of (i,j), so that the 
// attributes do not repeat. 
For all i', j' s.t. i' > i and j' < j 

commons[i'][j'] = commons[i'][j'] — ca; 
// Construct the permutations of remaining attributes for the two subpaths, 
//to the left and right of the the split point, 
m = split[i][j]; 
MakePermutation(i, m); 
MakePermutation(m+l, j); 
END PROC 

Figure 3.7: Optimal Benefit Sort Orders for a Path 

odd-ben(ST) and ben(S e ) > even-ben (St)- Since the total benefit of the optimal solution 
ben(Sr) = odd-ben(Sx) + even-ben(ST), we have ben(S Q ) + ben(S e ) > ben(Sx)- Hence 
at least one of ben(S Q ) or ben(S e ) is > 1/2 ben(Sx)- There may be vertices not included 
in the chosen solution, e.g., the even level split in Figure 13781 does not include the root 
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Figure 3.8: A 1/2 Benefit Approximation for Binary Trees 

and leaf nodes. For these left over vertices arbitrary permutations can be chosen. Kenkre 
and Vishwanathan [28j have subsequently improved upon our result and have given a 
1 l ° 9 log 9 log n f ac tor approximation. 



3.3 Optimization with Favorable Orders 

The benefit model we presented in the previous section, does not take into account factors 
such as the physical sort order of a relation, available indices and size of base relations 
and intermediate results. Moreover, we assumed that the join order is fixed. In this 
section, we present a two phase approach to address the more general problem. In phase- 
1, which occurs during plan generation, we use the information about available indices 
and properties of physical operators to efficiently compute a small set of promising sort 
orders to try. We formalize this idea through the notion of favorable orders. Phase-2, is a 
plan refinement step and occurs after the optimizer makes its choice of the best plan. In 
phase-2, the sort orders chosen by the optimizer are refined further to reap extra benefit 
from the attributes common to multiple joins. Phase-2 uses the 1/2 benefit approximation 
algorithm of Section 13.2.31 



3.3.1 Favorable Orders 

Given an expression e, we expect some sort orders (on the result of e) to be producible 
at much lesser cost than other sort orders. Available indices, indexed materialized views, 
specific rewriting of the expression and choice of physical operators determine what sort 
orders are easy to produce. To account for such orders, we introduce the notion of favorable 
orders. In the discussion that follows, we use the following notations: 

56 



• cbp(e, o) : Cost of the best plan for expression e with o being the required output 
sort order 

• or : The clustering order of relation R 

• idx(R) : Set of all indices over R 

• o(I) : Order (key) of the index / 

• (s) : An arbitrarily chosen permutation of set s 

• P(s) : Set of all permutations of set s 

• schema(e) : The set of attributes in the output of e 

We first define the benefit of a sort order o w.r.t. an expression e as follows: 

benefit(o, e) = cbp(e, e) + coe(e, e, o) — cbp(e, o) 
Intuitively, a positive benefit implies the sort order can be obtained with lesser cost than 
a full sort of unordered result. For instance, the clustering order of a relation r will have 
a positive benefit for the expression cr p (r). Similarly, query covering secondary indices and 
indexed materialized views can yield orders with positive benefit. We call the set of all 
orders, on schema(e) , having a positive benefit w.r.t. e as the favorable order set of e and 
denoted it as ford(e). 

ford(e)— { o: benefit(o, e)> } 
Minimal Favorable Orders 

The number of favorable orders for an expression can be very large. For instance, every 
sort order having the clustering order as its prefix is a favorable order. We call a sort 
order o G ford(e) as a minimal favorable order if the following two conditions hold. 

1. o' e ford(e) such that d <o and cbp(e,o') + coe(e,o',d) = cbp(e,d). Intuitively, 
sort order o is minimal only if there does not exists a sort order d such that the 
cost of obtaining order o equals the cost of obtaining sort order d followed by an 
explicit sort to obtain order o. 

2. /9 o" G ford(e) such that o < o" and cbp(e,o") = cbp(e,o). Intuitively, sort order 
o is minimal only if there does not exists a sort order o" subsuming order o and 
available at the same cost as o. 
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We call the set of all minimal favorable orders of an expression e as the minimal 
favorable order set of e and denote it by ford-min(e). Conditions 1 and 2 above, ensure 
that when a relation has an index that provides sort order o efficiently, orders that are 
prefixes of o and orders that have o as their prefix are not minimal favorable orders. 
We define favorable orders of an expression w.r.t. a set of attributes s as: ford( e, s)= {oAs: 
o G ford(e)}. Intuitively, ford(e, s) is the set of orders on s or a subset of s that can be 
obtained efficiently Similarly, the ford-min of an expression w.r.t. a set of attributes s is 
defined as: ford-min(e, s)= {o A s : o G ford-min(e)} 

Heuristics for Favorable Orders 

Note that the definition of favorable orders uses the cost of the best plan for the expression. 
However, we need to compute the favorable orders of an expression before the expression 
is optimized and without requiring to expand the physical plan space. Further, the size 
of the exact ford-min of an expression can be prohibitively large in the worst case. In this 
section, we describe a method of computing approximate ford-min, denoted as afm, for 
SPJG expressions. We compute the afm of an expression bottom-up. For any expression 
e, afm(e) is computable after the afm is computed for all of e's inputs. 

1. e = R, where R is a base relation or materialized view. We include the clustering 
order of R and all secondary index orders such that the index covers the query. 
afm(R) = {o : o = Or or o = o(I), I G idx(R) and I covers the query} 

2. e = (7p(ei), where e\ is an arbitrary expression. 
afm(e) = {o : o G afm(e\) } 

3. e = Ili(ei), where e\ is any expression. We include longest prefixes of input favor- 
able orders such that the prefix has only the projected attributes. 

afm(e) = {o : 3o' G afm(e 1 ) and o = d A L} 

4. e = e\ N e 2 with join attribute set S = {a±, 02, ... , a n }. Noting that nested loops 
joins propagate the sort order of one of the inputs (outer) and merge join propagates 
the sort order chosen for join attributes, we compute the afm as follows: first, we 
include all sort orders in the input afms, next, we consider the longest prefix of each 
input favorable order having attributes only from the join attribute set and extend 
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it to include an arbitrary permutation of the remaining join attributes. 
Let T —afm(ex) U a/m(e 2 ) 

Then, afm(e x N e 2 ) = T U {o : d G T U {e} and o = d A S + (S-attrs(d A 5))} 

Note that, for the join attributes not involved in an input favorable order prefix 
(i.e., S—attrs(d A S)), we take an arbitrary permutation. An exact ford-min would 
require us to include all permutations of such attributes. In the post-optimization 
phase, we refine the choice made here using the benefit model and algorithm of 
Section 13.2.31 

5. e = L Q F (ei) 

afm(e) = {o : d E afm(e 1 )U{e} and o = d AL + (L-attrs(d A L))} 
Intuitively, for each input favorable order we identify the longest prefix with at- 
tributes from the projected group-by columns and extend the prefix with an arbi- 
trary permutation of the remaining attributes. 

3.3.2 Optimizer Extensions 

We make use of the approximate favorable orders during plan generation (phase- 1) to 
choose a small set of promising interesting orders for sort-based operators. We describe 
our approach taking merge join as an example but the approach is applicable to other 
sort based operators. In phase-2, which is a post-optimization phase, we further refine 
the chosen sort orders. 

Plan Generation (Phase-1) 

Consider an optimization goal of expression e = q n e r and required output sort order 
o. When we consider merge-join as a candidate algorithm, we need to generate sub-goals 
for ei and e r with the required output sort order being some permutation of the join 
attributes. 

Let S be the set of attributes involved in the join predicate. We consider only 
conjunctive and equality predicates. We compute the set X(e, o) of interesting orders as 
follows. 
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Steps to compute Z(e,o): 

1. Collect the favorable orders of inputs plus the required output order 

T(e, 6) =afm(ei, S) U afm(e r , S) U o A S, where afm(e, S) = {d A S : d G afm(e)} 

2. Remove redundant orders 

If oi, o 2 G T(e, o) and c»i < o 2 , remove Oi from T(e, o) 

3. Compute the set X(e, o) by extending each order in T(e, o) to the length of \S\; the 
order of extra attributes can be arbitrarily chosen 

X(e, o) = {o : d G T(e, o) and o = d + (S — attrs(o'))} 

We then generate optimization sub-goals for ei and e r with each order d G X(e, o) 
as the required output order and retain the cheapest combination. 

An Example: Consider Example 13.11 of Section 13.11 For brevity, we refer to the two 
catalog tables as ctl and ct2, the rating table as rt, and the columns with their starting 
letters. The afms computed as described in Section f3. 3. II follows: 
afm(ctl) = {(y)}, afm(ct2) = {(m)}, afm{rt) = {(m)}, afm(ctl n ct2) = {(y), (m), 
(y,co,c,m), (m,co,c,y)}, afm((ctl n ct2) m rt) = {(y),(m), (y,co,c,m), (m,co,c,y), 
(y,m), (m,y)} 

For (ctl N ct2) N rt we consider two interesting sort orders {(y,m), (m,y)} and for 
ctl xi ct2 we consider four sort orders {(y, co, c, m), (m, co, c, y), (y, m, co, c), (m, y, co, c)}. 
As a result the optimizer will consider the plan shown in Figure 13.21 

A Note on Optimality: If the set X(e, o) is computed using the exact set of minimal 
favorable orders (ford-miri), then it must contain an optimal sort order, i.e., a sort order, 
which produces the optimal merge join plan in terms of overall plan cost. 

Theorem 3.6 The set X(e, d) computed with exact ford-min contains an optimal sort 
order o p for the optimization goal e = (e/ x e r ) with (o) as the required output sort order, 
under Assumption A. 

Assumption A : If o\, 02 are two sort orders on the same set of attributes (i.e., attrs(oi) = 
attrs(o2)), then the CPU cost of sorting the result of an expression e to obtain 0\ will be 
same as that for o 2 , i-e., cpu-cost(e, 01) = cpu-cost(e, o 2 ). 
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The theorem essentially states the following: to identify an optimal sort order, it is 
sufficient to consider only the minimal favorable orders and not the full set of favorable 
orders. Appendix |A] gives the proof of Theorem 13.61 



Plan Refinement (Phase-2) 

During the plan refinement phase, for each merge-join node in the plan tree, we identify 
the set of free attributes, the attributes which were not part of any of the input favorable 
orders. Note that for these attributes we had chosen an arbitrary permutation while 
computing the afm (Section 13.3. ip . We then make use of the 1/2 benefit approximation 
algorithm for trees (described in Section 13.2.31) and rework the permutations chosen for 
the free attributes. 

Formally, let Pi be the permutation chosen for the join node t>j. Let be the 
order such that g; e afm(vi. left-input) U afm(vi. right-input) and \pi A <&| is the maximum. 
Intuitively, qi is the input favorable order sharing the longest common prefix with pi. Let 
fi =attrs(pi — (pi Aqi)); is the set of free attributes for w,. 

We now construct a binary tree, where each node rii corresponding to join-node V{ is 
associated with the attribute set /j. The attribute permutations for the nodes are chosen 
using the 1/2 benefit approximation algorithm; the chosen sort order for free attributes 
is then appended to the sort order chosen during plan generation {i.e., pi A qi) to get a 
complete order. 

The reworking of the sort orders will be useful only if the adjacent nodes share the 
same prefix, i.e., PiAqi was the same for adjacent nodes. This condition however certainly 
holds when the inputs for joins have no favorable orders. 

Figure 13.91 illustrates the post-optimization phase. Assume all relations involved 
(Ri, . . . , R4) are clustered on attribute a and no other favorable orders exist, i. e., afm(Ri) = 
{(a)}, for % = 1 to 4. The orders chosen by the plan generation phase are shown besides 
the join nodes with free attributes being in italics. The reworked orders after the post- 
optimization phase are shown underlined. 
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Figure 3.9: Post-Optimization Phase 



3.4 Application to Nested Queries 



The problem of choosing interesting sort orders on sub-query parameters, discussed in 
Section 12.31 is similar to the problem of choosing sort orders on inputs. We now briefly 
describe how the techniques presented in this chapter can be used to compute interesting 
sort orders on sub-query parameters. 

Let C2 be an expression nested below expression e± and S be be the set of parameters 
bound by e\ that are used inside e 2 . First, we compute the approximate favorable orders of 
appropriate sub-expressions in e 2 and deduce the corresponding sort order on parameters. 
As an example, consider a sub-expression a p {ez) in e^- We compute afm{ez) and then 
deduce the corresponding sort orders on parameters by mapping each attribute to the 
parameters it is equated to in predicate p. Next, we compute the favorable orders for 
the outer expression w.r.t. to the set of parameters it binds, afm{e\, S). The final set of 
interesting parameter sort orders is formed by taking the union of interesting parameter 
sort orders obtained from all the inner sub-expressions and the favorable orders of the 
outer sub-expression. 

Intuitively, our approach considers plans that sort the outer tuples to match a sort 
order favorable to the inner expression(s), and also plans that sort the inner relations to 
match a sort order favorable to the outer expression. 

3.5 Experimental Results 

We performed experiments to evaluate the benefits due to the proposed ideas. For com- 
parison, we use PostgreSQL (version 8.1.3) and two widely used commercial database 
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systems (we call them SYS1 and SYS2). All tests were run on an Intel P4 (HT) PC with 
512 MB of RAM. We used TPC-H 1GB dataset and additional tables as specified in the 
individual test cases. For each table, a clustering index was built on the primary key. 
Additional secondary indices built are specified along with the test cases. All relevant 
statistics were built and the optimization level for one of the systems, which supports 
multiple levels of optimization, was set to the highest. 

3.5.1 Modified Replacement Selection 

The first set of experiments evaluate the benefits of exploiting partial sort orders. Exter- 
nal sort in PostgreSQL employs the standard replacement selection (SRS) algorithm [52] 
suitably adapted for variable length records. We modified this implementation to ex- 
ploit partial sort orders available on the input (as described in Section I3TT]) . and we call 
it Modified Replacement Selection (MRS). We now present experiments comparing the 
performance of MRS with SRS. 

Experiment Al 

The first experiment consists of a simple ORDER BY of the TPC-H lineitem table on 
two columns (Lsuppkey, Lpartkey). 

Example 3. 2 An Order-By Query on Two Columns ( Query 1 ) 

SELECT Lsuppkey, Lpartkey 

FROM lineitem 

ORDER BY Lsuppkey, Lpartkey; 



A secondary index on Lsuppkey was available that covered the query (included the 
Lpartkey column )[]. On all three systems, the order by on (Lsuppkey, Lpartkey) took 
almost the same time as an order by on (Lpartkey, Lsuppkey) showing that the sort 
operator of these systems did not exploit partial sort orders effectively. We then compared 
the running times with our implementation that exploited partial sort order (Lsuppkey) 
and the results are shown in Figure 13.101 

3 On systems not supporting indexes with included columns, we used a table with only the desired two 
columns, clustered on Lsuppkey 
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For SYS1 and SYS2, as we did not have access to their source code, we simulated 
the partial sorting using a correlated rank query. The subquery sorted the index entries 
matching a given Lsuppkey on Lpartkey and the subquery was invoked with all suppkey 
values so as to obtain the desired sort order of (Lsuppkey, Lpartkey). 
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By avoiding run generation I/O and making reduced comparisons, MRS performs 
3-4 times better than SRS. 



Experiment A2 

The second experiment shows how MRS is superior in terms of its ability to produce 
records early and uniformly. Table R% having 3 columns (cl,c2,c3) was populated with 
10 million records and was clustered on (cl). The query asked an order by on (cl,c2). 
Figure 13.111 shows the plot of number of tuples produced vs. time with cardinality of 
cl = 10,000. 

MRS starts producing the tuples without any delay after the operator initialization 
where as SRS produces its first output tuple only after seeing all input tuples. By pro- 
ducing tuples early, MRS speeds up the pipeline significantly. Such early output behavior 
is highly desirable for Top-K queries. 

Experiment A3 

The third experiment shows the effect of partial sort segment size on sorting. 8 tables 
R , . . . , R 7 , with identical schema of 3 columns (cl, c2, c3) were each populated with 10 
million records and average record size of 200 bytes. Each table was clustered on (cl). 
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Table Ri had 10* tuples for each distinct value of cl {i.e., uniform distribution over 10 7_ * 
distinct values of cl), resulting in a partial sort segment size of 200 x 10* bytes. Thus 
Rq had a sort segment of size 1 tuple or 200 bytes, and Rj had a sort segment of size 10 
million tuples or 2GB. The query had an order by on (cl,c2). The running times with 
default and modified replacement selection on PostgreSQL are shown in Figure 13.121 



select * from R order by d , c2; (R pre-sorted on d) 
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Figure 3.12: Effect of Partial Sort Segment Size 



When the partial sort segment size is small enough to fit in memory (up to 10MB 
or 50K records), SRS produces a single sorted run on disk and does not involve merging 
of runs. The modified replacement selection (MRS) gets the benefit of avoiding I/O and 
reduced number of comparisons. When the partial sort segment size becomes too large to 
fit in memory, we see a sudden rise in the time taken by SRS. This is because replacement 
selection will have to deal with merging several runs. MRS however deals with merging 
smaller number of runs initially as each partial sort segment is sorted separately. As the 
partial sort segment size increases, the running time of MRS rises and becomes same as 
that of SRS at the extreme point where all records have the same value for cl. 



Experiment A4 

To see the influence of MRS on a query with joins and aggregates, we considered the 
query shown in Example 13.31 The query finds the number of lineitems and available 
quantity for each supplier, part pair. The supplier and part key columns were common 
to the join, group-by and the order- by clauses. Two indices, lineitem(Lsuppkey) and 
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partsupp (pssuppkey), were present and covered the query. The indices were thus useful 
to obtain part of the desired sort order (suppkey, partkey). 



Example 3.3 Number of lineitems for each (supplier, part) pair (Query 2) 

SELECT pssuppkey, psjpartkey, ps^availqty, count (Lpartkey) 
FROM partsupp, lineitem 

WHERE pssuppkey=Lsuppkey AND psjpartkey =Lpartkey 
GROUP BY ps suppkey, psjpartkey, ps-availqty 
ORDER BY ps suppkey, psjpartkey; 



On PostgreSQL the query took 63 seconds to execute with SRS, and 25 seconds with 
MRS. The query plan used in both cases was the same: a merge join of the two relations 
on (suppkey, partkey) followed by aggregation. 

3.5.2 Choice of Interesting Orders 

We extended our Volcano-style cost based optimizer, which we call PYRO, to consider 
partial sort orders and to use the proposed method for choosing sort orders for merge 
joins and aggregation. We compare the plans produced by the extended implementation, 
which we call PYRO-0, with those of PostgreSQL, SYS1 and SYS2. 

Experiment Bl 

For this experiment we used the query shown in Example 13.41 below, which lists parts for 
which the outstanding order quantity is more than the stock available at the supplier. 

Example 3-4 Parts Running Out of Stock (Query 3) 

SELECT ps_suppkey, psjpartkey, ps-availqty, sum(Lquantity) 
FROM partsupp, lineitem 

WHERE pssuppkey=Lsuppkey AND psjpartkey=ljpartkey AND Llinestatus='0' 

GROUP BY ps-availqty, psjpartkey, pssuppkey 

HAVING sum(Lquantity) > ps^availqty ORDER BY psjpartkey; 



Table partsupp had clustering index on its primary key (psjpartkey, pssuppkey). 
Two secondary indices, one on psjsuppkey and the other on Ijsuppkey were also built 
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on the partsupp and lineitem tables respectively. The two secondary indices covered all 
attributes needed for the query 

The experiment shows the need for cost-based choice of interesting orders. The 
choice of interesting orders for the join and aggregate are not obvious in this case for the 
following reasons: 

1. The order- by clause favors the choice of a sort order where partkey appears first. 

2. The clustering index on partsupp favors the choice of (partkey, suppkey). 

3. The secondary indices favor the choice of (suppkey, partkey) that can be obtained 
by using a low cost partial sort. Note that this option can be much cheaper due to 
the size of the lineitem relation. 

Therefore, the optimizer must make a cost-based decision on the sort order to use. 
Figures EI3] and EH show the plans chosen by PostgreSQL, PYRO-O, SYS1 and SYS2. 

All plans except the hash-join plan of SYS1 and the plan produced by PYRO-0 use 
an expensive full sort of 6 million lineitem index entries on (Lpartkey, Lsuppkey). Further, 
PostgreSQL uses a hash aggregate where a sort-based aggregate would have been much 
cheaper as the required sort order for the group-by was available from the output of 
merge-join. Note that the sort order (psjpartkey, pssuppkey, ps -availability), required by 
the group-by, can be inferred from the sort order (psjpartkey, pssuppkey), available on 
the result of merge-join, due to the presence of the functional dependency {psjpartkey, 
pssuppkey} — > {ps-availqty} . On SYS1, it was possible to force the use of a merge-join 
instead of hash-join and the plan chosen is shown in Figure 13.141 (b). 

We compared the actual running time of PYRO-O's plan with those of PostgreSQL 
and SYS1 by forcing our plan on the respective systems. Figures [3.151 and 13.161 show the 
execution times. It was not possible for us to force our plan on SYS2 and make a fair 
comparison and hence we omit the same. The only surprising result was the default plan 
chosen by SYS1 performed slightly poorer than the forced merge-join plan on SYS1. In all 
cases, the forced PYRO-0 plan performed significantly better than the other plans. The 
main reason for the improvement was the use of a partial sort of lineitem index entries as 
against a full sort. The final sort on partkey was not very expensive as only a few tuples 
needed to be sorted. 
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Figure 3.13: Plans for Query 3 (PostgreSQL and PYRO-O) 
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Figure 3.14: Plans for Query 3 (SYS1 and SYS2) 



For Query 3 (Example 13. 4p the plan generation phase (phase- 1) was sufficient to 
select the sort orders and phase-2 does not make any changes. We shall now see a case 
for which phase- 1 cannot make a good choice and the sort orders get refined by phase-2. 



Experiment B2 

This experiment uses the query shown in Example I3.5[ which has two full outer joins with 
two common attributes between the joins. We performed this experiment to see whether 
the systems we compare with are designed to exploit attributes common to multiple sort- 
based operators. 

The tables Rl, R2 and R3 were identical and each populated with 100,000 records. 
No indexes were built. As shown in Figure l3TT7T a). both SYS1 and PostgreSQL chose 
sort orders that do not share any common prefix. The plan chosen by PYRO-O is shown 
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Figure 3.15: Performance on PostgreSQL 
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Figure 3.16: Performance on SYS1 



Example 3.5 A query with attributes common to multiple joins ( Query 4 ) 

SELECT * FROM Rl 

FULL OUTER JOIN R2 

ON (Rl.c5=R2.c5 AND Rl.c4=R2.c4 AND Rl.c3=R2.c3) 
FULL OUTER JOIN R3 

ON (R3.cl=Rl.cl AND R3.c4=Rl.c4 AND R3.c5=Rl.c5); 



in Figure 15.17( b). In the plan chosen by PYRO-O, the two joins share a common prefix 
of (c4, c5), and thus the sorting effort is expected to be significantly less. SYS2, not 
having an implementation of full outer join, chose a union of two left outer joins. The two 
left outer joins used to get a full outer join used different sort orders making the union 
expensive, illustrating a need for coordinated choice of sort orders. The execution timings 
for Query 4 on PostgreSQL and SYS1 are shown in Figures 13.151 and 13.161 respectively 

Experiment B3 

In this experiment we compare our approach of choosing orders, PYRO-O, with the 
exhaustive approach, and a heuristic used by PostgreSQL. PostgreSQL uses the following 
heuristic: for each of the n attributes involved in the join condition, a sort order beginning 
with that attribute is chosen; in each order, the remaining n — 1 attributes are ordered 
arbitrarily. We implemented PostgreSQL's heuristic in PYRO along with the extensions to 
exploit partial sort orders and we call it PYRO-P. The exhaustive approach, called PYRO- 
E, enumerates all n\ permutations and considers partial sort orders. In addition, we also 
compare with baseline PYRO, which chooses an arbitrary sort order, and a variation of 
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Figure 3.17: Plans for Query 4 

PYRO-0, called PYRO-0 - that considers only exact favorable orders (no partial sort). 
Figure 13.181 shows the estimated plan costs. Note the logscale for y-axis. The plan costs 
are normalized taking the plan cost with exhaustive approach to be 100. In the figure, Q3 
and Q4 stand for Query 3 (Example 13.41) and Query 4 (Example 13.51) of Experiments Bl 
and B2. Q5 and Q6 stand for Query 5 and Query 6, and are shown below as Examples 13.61 
and 13.71 respectively. For Q3 and Q4, as very few attributes were involved in the join 
condition, PostgreSQL's heuristic along with extensions to exploit partial sort orders, 
produced plans which were close to optimal. However, for more complex queries the 
heuristic does not perform well since it makes an arbitrary choice for secondary orders. 



Example 3.6 Total value executed for a given order (Query 5) 



SELECT Tl.Userld, Tl.Basketld, Tl.ParentOrderld, Tl.Waveld, Tl.ChildOrderld, 

(Tl. Quantity * Tl. Price) as OrderValue, 

SUM(T2. Quantity * T2. Price) as ExecValue 
FROM TRAN Tl, TRAN T2 

WHERE Tl. UserId=T2. Userld AND Tl.ParentOrderId=T2.ParentOrderId AND 
Tl.BasketId=T2.BasketId AND Tl. WaveId=T2. Waveld AND 
Tl . ChildOrderId= T2. ChildOrderld AND Tl . Tran Type= New 7 AND 
T2. TranType= Executed ' 

GROUP BY Tl.Userld, Tl.Basketld, Tl.ParentOrderld, Tl.Waveld, Tl.ChildOrderld; 
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Example 3. 7 Basket Analytics ( Query 6) 



SELECT * FROM BASKET B, ANALYTICS A 
WHERE B.ProdType = A.ProdType AND B. Symbol 
B. Exchange = A. Exchange; 
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Figure 3.18: Evaluating Different Heuristics Figure 3.19: Optimizer Scalability 



3.5.3 Optimization Overheads 

The optimization overheads due to the proposed extensions were negligible. During plan 
generation, the number of sort orders we try at each join or aggregate node is of the order 
of the number of indices that are useful for answering the query, which in most practical 
case is expected to be small. Figure 13.191 shows the scalability of the three heuristics. For 
this experiment a query that joined two relations on varying number of attributes was 
used. Though PYRO-P and PYRO-O take the same amount of time in this experiment, 
in most cases, the number of favorable orders is much less than the total number of 
attributes involved and hence PYRO-O generates fewer interesting orders than PYRO-P. 

The plan-refinement algorithm presented in Section 13.2.31 was tested with trees up 
to 31 nodes (joins) and 10 attributes per node. The time taken was negligible in each 
case. The execution of plan refinement phase took less than 6 ms even for the tree with 
31 nodes. 

Both the optimizer extensions and the extension to external-sorting (MRS) were 
straight forward to implement. The optimizer extensions neatly integrated into our ex- 
isting Volcano style optimizer. 
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3.6 Related Work 



Both System R [48] and Volcano [23] optimizers consider plans that could be locally sub- 
optimal but provide a sort order of interest to other operators, and thus yield a better 
plan overall. However, both System R and Volcano assume that operators have one or 
few exact sort orders of interest. This is not true of operators like merge-join, merge- 
union, grouping and duplicate elimination, which have a factorial number of interesting 
orders. Heuristics such as the PostgreSQL heuristic, are commonly used by optimizers. 
Details of the heuristics are publicly available only for PostgreSQL. Further, System R 
and Volcano optimizers consider only those sort orders as useful that completely meet an 
order requirement. Plans that partially satisfy a sort order requirement are not handled. 
In this chapter we addressed these two issues. 

The seminal work by Simmen et.al. [51] describes techniques to infer sort orders 
from functional dependencies and predicates applied, and thereby avoids redundant sort 
enforcers in the plan. Simmen et.al. [JT] briefly mention the problem of non-exact sort 
order requirements and mentions an approach of propagating an order specification that 
allows any permutation on the attributes involved. Though such an approach is possible 
for single input operators like group-by, it cannot be used for operators such as merge-join 
and merge-union for which the order guaranteed by both inputs must match. Moreover, 
the paper does not make it clear how the flexible order requirements are combined at 
other joins and group-by operators. Simmen et.al. [51] also note that the approach of 
carrying a flexible order specification increases the complexity of the code significantly. 
Our techniques do not use flexible order specifications and hence can be incorporated 
into an existing optimizer with minimal changes. Further, our techniques work uniformly 
across all types of operators that have a flexible order requirement. 

Claussen et.al. |7] explore early-sorting as a means to reduce sorting cost in query 
plans. The key idea is to avoid sorting of large intermediate join results by pushing sorting 
to base relations, and using order preserving hash joins. There has been significant work 
on avoiding redundant sorting by inferring sort orders and groupings using functional 
dependencies [511 ESI [391 HQ] • These techniques are complementary to our work. 
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3.7 Summary 



In this chapter we addressed the problem of choosing efficient sort orders for sort-based 
operators such as merge-join and sort-based grouping. We showed that even a simplified 
version of the problem is NP-Hard, and proposed principled heuristics for choosing inter- 
esting orders. Our heuristics are guided by the notion of favorable orders. We take into 
account important issues such as partially matching sort orders and attributes common 
to multiple operators. We then explained how the solution can be used for choosing ef- 
ficient parameter sort orders for nested queries. We presented a detailed experimental 
study on widely used database systems, and the results showed significant performance 
improvements due to the proposed techniques for several queries. 
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Chapter 4 

Rewriting Procedures for Batched 
Bindings 

In this chapter we consider parameter batching as a means of improving performance of 
iteratively invoked database procedures. Several data retrieval and update tasks need 
more expressive power than what standard SQL offers. Therefore, many database appli- 
cations perform queries and updates from within procedural code that encodes business 
logic. Stored procedures and user-defined functions written using procedural extensions 
to SQL (e.g., PL/SQL [35] , PL/pgSQL [JT]) are widely used. Other paradigms that allow 
mixing of procedural constructs with database access are SQL extensions to procedural 
languages, e.g., SQL J [53] and Microsoft's language integrated query (LINQ) [35J, and 
application programming interfaces (API) for database access, e.g., JDBC, ODBC. Such 
procedures/programs can run either inside a database system, as stored procedures or 
user-defined functions (UDF), or outside the database system. As queries in SQL can in 
turn invoke UDFs (as part of WHERE/SELECT clauses), control can alternate between 
the SQL execution engine and the procedural code. 

We have earlier seen an example (Example 11.31 in Chapter [1]) of a query that invokes 
a simple UDF in its WHERE clause. In general, UDFs/procedures can be more complex 
with arbitrary control-flow and looping. Such a UDF is shown in Example 14.11 where the 
UDF counts the number of items in a given category and all its sub-categories. 

Parameter batching is an important technique to speedup iterative execution of 
queries and updates [221 EH] • Parameter batching allows the choice of efficient set-oriented 
plans for queries and updates, thus reducing random 10. For inserts and updates batching 
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Example 4-1 Query with a Complex UDF - Items in a Category and Sub-Categories 



SELECT * FROM category WHERE count .items (category Jd) > f (level); 

INT count_items(INT catid) 
DECLARE 

INT totalcount; INT curcat; INT catitems; INT subcat; 
INT stack[100]; INT top; RECORD catrec; 
BEGIN 



si: totalcount := 0; 

s2: top := 0; 

s3: stackftop] = catid; 

s4: top := top + 1; 

s5: WHILE top > LOOP 

s6: top := top — 1; 

s7: curcat := stack[top]; 

s8: catitems := SELECT count(itemJd) FROM item WHERE categoryJd = curcat; 

s9: totalcount := totalcount + catitems; 

// Now push all the subcategory ids onto the stack 
slO: FOR catrec IN SELECT categoryJd FROM category 

WHERE parent_category=curcat LOOP 
sll: stack[top] := catrec. categoryJd; 

sl2: top := top + 1; 

END LOOP; 
END LOOP; 
sl3: RETURN totalcount; 



END; 



allows efficient integrity checks and index maintenance. When the calls span across a 
network, batching also helps in reducing network round-trip delays. 
In this chapter we present our work on automatic rewrite of iterative programs to fulfill 
the following needs: 

(a) Rewrite UDFs and stored procedures to accept batched bindings 

(b) Rewrite programs that repeatedly execute a parameterized query or stored procedure 
to use batched invocation when possible 

The rest of this chapter is organized as follows. Section 14.11 defines the problem, in- 
troduces the notion of batched forms and batch-safe operations. We identify the program 
transformation goal of pulling expensive operations out of loops as a key for addressing 
the two needs mentioned above. Section 14.21 describes the background needed for our 
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approach. In Section 14.31 we present the set of program transformation rules, which to- 
gether achieve the required program transformation, when the program satisfies certain 
conditions. The program transformation rules rely on the data dependence graph of the 
given program and can work with complex procedures such as the ones shown in Exam- 
ple 11.31 and Example 14.11 We present the results of our experiments in Section 14.61 The 
experiments are based on real-life examples of performance problems. We discuss related 
work in Section 14.71 and summarize in Section 14.81 



4.1 Rewriting for Batched Bindings 

In order to exploit the benefits of batching, we must have an efficient batched form of 
the operation being invoked, and a way of using the batched form in place of repeated 
invocations of the operation. In this section, we formally define batched forms of operations 
and introduce the problem of rewriting loops so as to make use of the batched forms 
of expensive operations invoked within them. We also consider the issues in batching 
invocations of operations that have side-effects. Finally, we introduce the problem of 
generating batched forms of complex procedures. 

4.1.1 Batched Forms of Operations 

Informally, the batched form of an operation takes a batch (or set) of parameters at once 
and processes them. Batched forms of operations are typically more efficient than iterative 
invocation of the corresponding non-batched forms. For example, a database bulk load 
operation can be thought of as a batched form of the insert operation, assuming logging 
can be ignored. Similarly, a relational join can be thought of as a batched form of relational 
selection with a parameterized predicate. Note that we model a batch as a set and not a 
sequence. This is due to the fact that most batched forms do not guarantee the order in 
which the elements in the batch are processed, and this is an important reason for their 
efficiency. We now define the batched forms formally. 

Batched Forms of Pure (Side-Effect Free) Functions 

Let / : D — > R be a side-effect free function, where D is the domain and R is the range 
of /. A function fb : BD — ► BR is considered a batch form of / if the following are true. 
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1. The domain BD is the power set of D 



2. The range BR is the power set of D x R 



3. \/beBD,fb(b) = [j heb {(b l ,f(b i ))} 



Example: Consider the square function defined as sq(x) = x 2 . The corresponding 
batched form can be defined as sqb(sx) = {(x, x 2 ) : x G sx} 

Intuitively, the batched form of a function takes a set of parameters and returns a 
set comprising of all the results. To establish the correlation between a parameter and 
the corresponding result we require the batched form to return the parameter value along 
with the result. 

Batched Forms of Parameterized Relational Queries 

Relational queries are pure functions that return (multi)sets of tuples. Though we can 
use the above definition of batched forms for queries, it makes the return type of batched 
queries violate the first normal form (INF) as queries may have set- valued return type. We 
desire the first normal form on batched queries so as to be able to make our techniques 
easily implementable in existing relational query processing systems. Hence we use a 
slightly modified definition for the batched form of a query. 

Let q(pi, P2, ■ ■ ■ , Pn) be a query with n parameters. Let v±, i>2, . . . , v m be the attributes 
in the result-set that q returns. The batched form qb of q takes a set p of n-tuples as its 
parameter (each n-tuple gives a binding for the parameters Pi,P2, ■ ■ ■ ,p n )- The result-set 
of qb contains the union of g's results for all the parameter tuples in p. Each tuple in qb's 
result-set contains n + m attributes Pi,P2, ■ ■ ■ ,p n , ^1,^2, • • • , v m . Often only a subset of 
the parameters are sufficient to establish the correlation with the corresponding results. 
However, for simplicity, we assume the batched form returns all the parameter values 
along with the results. Formally, 



When the result of q is an empty set for any parameter binding, the result-set of qb contains 
a tuple corresponding to the specific parameter binding but the attributes v±, v 2 , ■ ■ ■ ,v n 
will be set to null. 




{{p t } x q(p t )} if q(p t ) ^ 
{{pt} x {null-m}} otherwise 
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Example: Consider the following parameterized select-project query: 



q(custid) = U ordrid ( 

O " customer-id=custid 

(ORDERS)) 

The corresponding batched form can be defined as: 

qb(cs) = ^-(custid,ordrid){ cs custid=customer-id ORDERS), 

where cs is the parameter relation having the attribute custid. 

Batched forms of relational queries have been used in the context of query decor- 
relation [4"9"j [T7J [H]. As shown in the above example, batched forms of simple SPJ 
queries use a join or an outer join. Batched forms of aggregate queries either use grouping 
followed by join or an outer-join followed by grouping. The details of deriving correct 
batched forms of SQL queries can be found in the literature on decorrelation. Exam- 
ple 14.21 shows the batched forms of queries ql and q2 used inside the UDF of Example 11.31 
in Chapter [U 

Example 4-2 Batched Forms of Queries in Example \1.3\ 
qlb(r): The batched form of query ql, with r as the parameter batch 

SELECT r.curcode, c.exchrate FROM r JOIN curexch c ON r.curcode=c.ccode; 

q2b(r): The batched form of query q2, with r as the parameter batch 

SELECT r.itemcode, r.amount-usd, count(b.itemcode) AS count-offers 
FROM r LEFT OUTER JOIN buyojfers b ON b.itemid = r.itemcode AND 

b. price >= r.amount-usd 
GROUP BY r.itemcode, r.amount-usd; 



Most database systems also support batched bindings for basic data manipulation 
operations like insert, delete and update. The insert into . . . select from . . . construct 
can be used as the batched form of insert operation. For updates, we can use the merge 
construct of SQL:2003 (or the update . . . from . . . construct of SQLServer), as the batched 
form. Batched forms of these operations employ various techniques such as set-oriented 
index update and set-oriented integrity checks to offer increased efficiency over the corre- 
sponding non-batched operations. 
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Operations having Side-Effects 

An operation with side-effects, in addition to returning a value, modifies the system state. 
Further, the return value may be a function of not only the arguments (parameters) 
but also the system state. We can model such an operation with a pair of functions, 
fv:SxD^R and fs:SxD-+S, where S is the set of all possible system states, D 
is domain of parameters and R is the domain of result values. Since we assume batched 
forms are free to process the arguments in any order, we can define the batched forms 
for only a restricted class of side-effect causing operations. We call this restricted class of 
operations, for which the batched forms are defined, as batch-safe operations. We call an 
operation batch-safe if the following conditions hold: 
When processing a set of arguments, 

1. the operation's return value, for any parameter, is independent of the order in which 
the parameters are processed. 

Vs G S,Wx,y G D,fv(S,x) = fv(fs(S,y),x) 

2. the final system state is independent of the order in which the arguments are pro- 
cessed. 

Vs G S,Vx,y G D, fs(fs(s,x),y) = fs(fs(s,y),x) 

For example, an INSERT operation on a table that has no constraints defined, is 
batch-safe. However, in the presence of table constraints (e.g., a unique column), the 
INSERT operation may or may not be batch-safe depending on the exact set of parameters. 
Operations that write to an external file/device or communicate with other systems may 
or may not be batch-safe depending on the specific application. If the programmer has 
enclosed an operation inside a loop that gives no guarantee of order, e.g., iteration over 
the result set of a query without order-by clause, we may treat the operation as batch-safe. 
Such analysis for batch-safety of an operation can be extended further, but is beyond the 
scope of this work. Also, in this chapter, we assume that operations with side-effects do 
not have a return value, they just cause a change in the system state. 

4.1.2 Rewriting Loops to Use Batched Forms 

Consider a statement that invokes operation q inside loop L of a program P. We say that 
the invocation of q is batchable w.r.t loop L if it is possible to rewrite P into an equivalent 
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program P' by removing the invocation of q from the body of the loop and making a 
single invocation of the batched form of q (i.e., qb) outside the loop. As an example, 
consider the query in Example 11.31 which invokes a UDF. A program corresponding to 
the iterative plan for this query is shown in Example 14.31 



Example 4-3 An Iterative Program/Plan for the Query in Example \1.3\ 

for each t in select orderid, itemcode, amount, curcode from sellorders where mkt='NSE' loop 
< body of the udf with parameters bound from t > 
if (return value > 0) 
output t. orderid; 

end loop 



In this example, both the queries inside the body of the UDF can be batched w.r.t 
the enclosing loop. Example 14.41 shows a rewritten form of the query and the UDF in 
Example 11.31 The rewritten query makes used of a batched form of the UDF, which is 
called count-offers-batched. The batched form of the UDF, shown in Example 14.41 takes a 
table-valued parameter. The table-valued parameter (or the batch) is constructed from all 
the distinct parameter values with which the non-batched form of the function would be 
invoked by the original query. The UDF count-offers _batched, makes use of a temporary 
table r2. Note that all the updates performed by the UDF are only on the temporary table. 
The UDF count_offersJ)atched in turn makes use of the set-oriented forms the queries ql 
and q2 . These set-oriented forms called ql b and q2b were shown in Example 14.21 The 
UDF uses the SQL MERGE construct to merge the results of the set-oriented queries with 
the temporary table r2, which stores the values of all the local variables and function 
parameters. In the batched version we still have an iterative loop (the only loop in the 
body of count-offers-batched), but it contains only inexpensive operations. It is possible 
to pull even such operations out of the loop, we discuss about this in Chapter [5j 

Even if an operation is batch-safe, it may not always be possible to batch a given 
invocation of the operation w.r.t a loop, because of side effects within the loop. In 
Section 14.31 we provide a set of program transformation rules that allow batching the 
invocations of an operation when the program satisfies certain conditions. 
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Example 4-4 Batched Form of the Query in Example 



Let rl = SELECT DISTINCT itemcode, amount, curcode FROM sellorders WHERE mkt='NSE'; 

Now, the query of Example \1.3\ can be written as: 

SELECT orderid FROM sellorder so, count_offers_batched(rl) br 

WHERE so.mkt='NSE' AND so. itemcode=br. itemcode AND so. amount=br. amount AND 
so. curcode=br. curcode AND br.count_offers > 0; 

where, count_offers_batched is the batched form of the UDF count_offers defined as follows. For brevity, 
we omit the schema details when it is obvious. 

TABLE count_offers_batched( / T,4.BL£ rl) 

DECLARE 

TABLE (itemcode, amount, curcode, condl, amount_usd, count-offers) rl; //A temporary table 
BEGIN 

FOR EACH tl IN rl LOOP 

FLOAT amount-usd; BOOLEAN condl; INT count_offers; 

condl := (tl. curcode == "USD"); 

condl == true 1 ? amount-usd := tl. amount; 

j I variables below take default values if unassigned 

r2.addRecord((tl. itemcode, tl. amount, tl. curcode, condl, amount-usd, count-offers) ) ; 
END LOOP 

MERGE INTO r2 USING qlb(el) AS qlb ON (r 2. curcode =qlb. curcode) 
WHEN MATCHED THEN UPDATE SET amount.usd = amount * qlb.exchrate; 

II where el is SELECT DISTINCT curcode FROM r2 WHERE condl=false; 
1 1 and qlb, the batched form of query ql, is shown in Example \4-2\ 

MERGE INTO r2 USING q2b(e2) AS q2b 

ON (r2.itemcode=q2b. itemcode AND r2.amount-Usd=q2b.amount_usd) 
WHEN MATCHED THEN UPDATE SET count.offers = q2b.count_offers; 

II where e2 is SELECT DISTINCT itemcode, amount.usd FROM r2; 
1 1 and q2b, the batched form of query q2, is shown in Example \4-2\ 

RETURN (SELECT itemcode, amount, curcode, count-offers FROM r2;) 
END 

Note: MERGE is a SQL: 2003 construct. 



4.1.3 Generating Batched Forms of Procedures 

To speed up applications or queries that make repeated calls to stored procedures or 
UDFs we need efficient batched forms of these procedures. However, batched forms of 
complex operations like stored procedures and UDFs are (as far as we know) not available 

82 



fb-trivial(pt) Apply(pt, f) 

where the function Apply is defined as : 

Apply(pt, f): 

r = {}; 

for each t in pt 

< body of f with parameters bound from attributes of t > 

rf= return value off, 

r.addRecords({t} x rf); 
return r; 

Figure 4.1: Trivial Batched Form of a Procedure 

unless implemented by the programmers manually. We therefore consider the problem of 
automatically generating batched forms of stored procedures and UDFs. Our goal is to 
generate efficient batched forms by batching the expensive operations within the body of 
the procedure/UDF. 

Given any side-effect free function or batch-safe operation (which could be a stored 
procedure) /, we can generate its trivial batched form as shown in Figure 14.11 Batched 
form of any procedure can thus be generated by enclosing it in a loop that iterates over 
the parameter set and executes the statements inside the procedure repeatedly. However, 
such a rewriting is not of significant benefit as cost(fb-trivial) for a batch size of k is 
nearly same as kxcost(f); Such a rewriting can still be useful in reducing round-trip 
delays in client server environments. More interesting batched rewrites are the ones that 
use specialized and efficient strategies for batch processing, e.g., batched selection within 
the procedure can be processed as a join, while a query in the procedure which performs 
a selection followed by an aggregate would have a batched form that employs grouping 
followed by join. 

To generate an efficient batched form of a procedure, we can start with the trivial 
batched form of the procedure and try to batch each expensive sub-operation in the body 
of the procedure w.r.t. to the enclosing loop. To do so, the sub-operation must be taken 
out of the enclosing loop and substituted by its batched equivalent. If the sub-operation 
is a query, its batched form may be known. If the sub-operation is a procedure call, we 
recursively invoke the method to generate the batched form of the called procedure. 

As we can see, generating batched form of a complex procedure reduces to the task 
of batching selected operations w.r.t. a loop (see Section |4.1. 2p . We address this problem 
in Section 14.31 after introducing some preliminaries in Section 14.21 
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4.2 Background for Our Work 



In this section, we formally outline the language constructs we support and provide back- 
ground material on the terminology used in this chapter. 

4.2.1 Language Constructs 

For our illustration, we use a simple procedural language. The language offers expressions, 
assignment, conditional branching and looping. The supported language constructs are 
briefly described below. 

• while loops are of the form w/w/eCpredicate) loop . . . end loop;. 

• cursor loops are of the form for each record in query/table loop . . . end loop; 
An order by clause can be present if the iteration is over a table. When present, 
the order by is assumed to be ascending by default. Unlike the more general and 
powerful while loops, the cursor loops iterate over the result of a query and hence 
their iteration space is known once the query is evaluated. 

• Branching is possible through if-then-else having the syntax if (predicate) {. . .} 
else {. . .}. 

• Scalar variables: we consider only scalar variables in our discussion. However, our 
techniques can be easily extended to handle arrays, records and collection types. 
Each looping block can have variables local to the block. Statements can access 
variables local to their block or variables defined in any of the ancestor blocks. 

• Result of scalar queries (queries that return exactly one tuple) can be assigned to 
variables. 

e.g., vi, v 2 , • • • , v n = select c 1 ,c 2 ,...,c n from . . .; 

Note that a single assignment can be used to simultaneously assign values to multiple 
variables. Set-valued queries can be used only in the context of cursor loops. 

We also use a few additional constructs in the transformed code. We assume these 
constructs are not available for the end-user and hence cannot be present in the input 
program. 
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• The TABLE type is used to construct the parameter batches. The TABLE type 
can be implemented so as to make use of both main memory and disk. 

• Updatable cursor loops are of the form for each record by ref in table loop . . . end 
loop; Any updates to the record modify the underlying table variable. 

• The transformed program may use relational operators such as selection, projection 
and join. 

Assumptions 

We make the following assumptions about the program. 

1. Unconditional control transfer statements like GOTO, EXIT and CONTINUE are 
not used. 

2. Statements have no hidden side-effects. Information about all reads and writes 
performed by a statement (either on memory locations or on external resources like 
files and databases) are captured in the data dependence graph (explained in the 
next section). 

4.2.2 Data Dependence Graph 

The Data Dependence Graph (DDG), sometimes referred to as Program Dependence 
Graph [37J EH], of a program is a directed multi-graph in which program statements 
are nodes (vertices) and the edges represent data dependencies between the statements. 
The different types of data dependence edges are explained below. 

• A flow- dependence edge ( — >) exists from statement (node) s a to statement s& if s a 
writes a location that s& may read, and Sf, follows s a in the forward control-flow. 

• An anti- dependence edge (— — ») exists from statement s a to statement s& if s a reads 
a location that s& may write, and s& follows s a in the forward control flow. 

• An output-dependence edge ( ►) exists from statement s a to if both s a and 

may write to the same location, and s& follows s a in the forward control flow. 
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• A loop- carried flow- dependence edge ( 



lfd l 



■>) exists from s a to Sf, if s a writes a value 



in the i th iteration of a loop L and s& may read the value in a later iteration (j th 
iteration where j > i). 

• Similarly, there are loop carried counter parts of anti and output dependencies and 



The data dependence graph for the sample UDF of Example 14.11 is shown in Fig- 
ure H2J 

External Dependencies 

Statements may have dependencies not only through program variables but also through 

FD 

the database and other external resources like files. For example, we have Si — ► S2 if 
Si writes a value to the database, which s 2 may read subsequently. Though standard 
dataflow analysis performed by compilers considers only dependencies through program 
variables, it is not hard to extend the techniques to consider external dependencies, at 
least in a conservative manner. For instance, we could model the entire database (or file 
system) as a single program variable and thereby assume every query/read operation on 
a database/file to be conflicting with an update/write of the database/file. In practice, it 
is possible to perform a more accurate analysis on the external writes and reads. When 
referring to external dependencies explicitly, we use E as a superscript to the corresponding 

FD E 

type of dependence edge e.g., si > S2- 

4.3 Program Transformation 

Recall from Section |4.1 .21 that an invocation of an operation q inside a loop L is said to be 
batchable w.r.t loop L if it is possible to rewrite the program into an equivalent program 
where the invocation of q is removed from the body of the loop and a single invocation 
of the batched form qb is made outside the loop. For such a rewrite, it is necessary 
that the operation should be batch-safe. However, the data and control dependencies 
between program statements may make it impossible to batch a statement that invokes 
an operation even if the operation is batch-safe. In this section, we present a set of 
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Figure 4.2: A Subgraph of the Data Dependence Graph for the UDF in Example 14.11 

program transformation rules, which enable us to batch a statement w.r.t. a loop when 
the program satisfies certain conditions. 

The program transformation rules we present, like the equivalence rules of relational 
algebra, allow us to repeatedly refine a given program. Applying a rule to a program 
involves substituting a program fragment that matches the antecedent (LHS) of the rule 
with the program fragment instantiated by the consequent (RHS) of the rule. Some rules 
facilitate the application of other rules and together achieve the goal of batching a desired 
statement w.r.t. a loop. Applying any rule results in an equivalent program and hence 
the rule application process can be stopped at any point. 

Notation Used in the Transformation Rules 

• R(s) : The read-set of s is the set of variables read by statement or statement sequence s. 

• W(s) : The write-set of s is the set of variables written by statement or statement sequence 
s. 

• U(s) : R(s) U W(s). Called the use-set of s. 

• pred? s : Conditional statement. Equivalent to if (pred) then s. 

• LV(s) : Set of variables local to the block statement s. 
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• NLV(s) : Set of variables accessible but not local to the block statement s. These are 
variables defined in an ancestor block. 

• \ss\ : Length of the statement sequence ss 

• ss[i] : Stmt at the i th position in sequence ss, 1 < i < \ss\. 

• s\ + S2 : Concatenation (of statement sequences or strings). 

• SUBS(s, v, v') : Statement obtained by substituting all occurrences of variable v in state- 
ment s by variable v' . 

• SUBS{s,vs,map) : Statement obtained as follows. For each variable v G U(s) n vs, 
substitute all occurrences of v in statement s by map(v). 

• s U* r : Disjoint union (UNION ALL) of relations s and r. 

• a ..., a „( r ) : Projection without duplicate elimination 

• (ai, a2, . . . , a n ) : Tuple constructor 

• type-of(e) : Data type of expression e. 

Predicates on the DDG 

F D 

• s\ ► S2 : True only if the DDG contains a flow-dependence edge (either internal or 

external) from si to S2- 

FD+ 

• si > S2 : True only if the DDG contains a path from si to S2 having only FD edges. 

• si (- FD l LF - p ^ + ) S2 . True only if the DDG contains a path from s\ to S2 having only FD 
or LFD edges. 

• indep(s\, S2) ■ True only if there are no dependencies between statements si and «2- 

• Similarly we have predicates for the existence of other types of dependencies. 

Conventions 

1. Loops of the form "for each t by ref in r" are updatable cursor loops. The underlying 
set r can be modified by assignments to the tuple's attributes. 

2. Suppose a table-valued expression e has arity n. The rename operator p x ( ai ,a2,...,a n )( e ) 
returns the result of expression e under the name x, and with the attributes renamed 
to ai, a 2 , . . . , a n . 
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3. The merge operator M. ai= b lt ..., an =b n (j', s) (based on the SQL:2003 merge construct) 
updates relation r by merging in the records of s. For each record in s that matches 
a record in r on the attributes common to r and s, the record in r is updated 
by assigning the values of attributes b±,. . . ,b n from the s tuple to the attributes 
ai, . . . , a n of the r tuple correspondingly. 

4. Projection that removes the specified attributes: 
n a - 1; <j2,...,a n (?") is equivalent to IIg_ {01,02, ...,a n 

}(r), where 5 is the set of all attributes 

in schema(r) 

5. Multi-assignment from scalar queries: Let q be a query returning exactly one tuple 
of arity n. The assignment i>i,i>2, . . . ,v m — q (where m < n) assigns the values of 
the first m attributes of the returned tuple to the m variables on the LHS in that 
order. 

In all the rules, unless specified, we assume q to be a batch-safe operation with qb as 
its batched form. 

4.3.1 Rewriting Set Iteration Loops (Rule 1) 

In the simplest case, the loop contains a single statement that invokes the operation we 
want to batch. In this rule, we consider cursor update loops - the loop iterates over a 
set of tuples and the values returned by the operation to be batched are assigned back 
to the attributes of the tuple associated with current iteration. Rules 1A through 1C are 
the basic rules that allow replacing a loop by a batched invocation. Rule ID and all the 
other rules presented in this section help us transform the program so as to enable the 
application Rules 1A, IB or 1C. 

In Rule-lA, q can be any batch-safe operation (with or without side-effects). Note 
the use of projection without duplicate elimination (U d ) for constructing the parameter 
multiset. However, in rules IB and 1C, we require q to be a pure function returning 
exactly one tuple (e.g., a scalar query). In rules IB and 1C we construct a duplicate-free 
parameter set using the standard relational projection. This avoids the duplicate record 
problem while merging back the results of the batched invocation. 

For brevity, in rules IB and 1C we omit the form with loop invariant parameters. 
In rules IB and 1C we deal only with assignments to cursor attributes and not variables. 
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Rule 1 Batching Simple Set Iteration Loops 



Rule 1A: Unconditional invocation with no return value 
lA(i) Basic form 

for each t in r loop 

q(t.ci,t.C2,...,t.Cm); g&(II* )Ca) ... iCm (r)); 
end loop; 

lA(ii) Form with loop invariant parameters 
for each t in r loop 

q(t.d,t.c 2 , . . . ,t.c m ,vi,v 2 , • • .,v n ); <==4> qb(W* uC2 ,... >Cm (>) x {(ui,u 2 , . . . ,v n )}); 
end loop; 

Rule IB: Unconditional invocation with return value 
for each t by ref in r loop 

t-C w ±, t.C w 2 , • ■ • j t.C wn — q(t.C r i, t.C r 2 1 ■ ■ ■ j t-C r m)i 

end loop; 

where g is a pure function. 

$ 

At Cw i=c u)1 /,...,c u ,„=c«)n'( r 5 e ) 

where e = fc( Crl ,.,c rm ,c ralI ,.,w)?K n c rt ,.,c rra ('")); 

Rule 1C: Conditional Invocation 

for each t by ref in r loop 

(t.cv == true)? t.c w i, ... , t.c wn = q(t.c r i, ... , t.c rm ); 
end loop; 

where q is a pure function. 

t 

M Cwl=Cwl ,,..., Cwn=cwn i(r, e), where 

e = Px(c rl ,...,c rm ,c wl ,,...,c wn ,)Qb(Tlc rl ,...,c rm (o'cv=true r)); 

Note that in rules IB and 1C we use duplicate-eliminating projection (II) and not (H d ). 

Rule ID: Removal of Order-By 

for each t [by ref] in r order by cols loop for each t [by ref] in r loop 

batch-safe-operation(t) \ > batch- safe- operation (t) 

end loop; end loop; 



The reason for this will be clear when we describe rules 2 and 3. For now, it suffices to 
know that the later transformations bring the program into a form in which we can apply 
one of the rules described in this section. 
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4.3.2 Splitting a Loop (Rule 2) 



In general, the body of a loop may contain several statements along with the query 
execution statement (s) we are interested in batching. In such a case, we try to split the 
loop into multiple loops, such that the statement we are interested in batching appears 
in a loop by itself. Consider the examples in Figures 14.31 and 14.41 The statements to 
be batched are shown in bold. As shown in the figures, we split the loop into multiple 
set-iteration loops. The aim is to have the statement to be batched appear in a loop 
by itself, a form in which we can apply Rule 1. For example, in the rewritten code of 
Figure I4.3[ the loop containing a single INSERT statement can be replaced by a batched 
invocation, by first removing the order by using Rule ID and then applying Rule lA(i). 

for each r in SELECT grantid, empid, gnum FROM grantload loop 
int internalid = foo(r. grantid, r. empid); 

INSERT INTO grants VALUES (internalid, r. empid, r.gnum); 
total + = r.gnum; 
end loop; 

TABLE(key, empid, gnum, internalid) t; 
int loopkey = 0; 

for each r in SELECT grantid, empid, gnum FROM grantload loop 

RECORD(key, empid, gnum, internalid) s; 

s. empid = r. empid; s.gnum = r.gnum; 

int internalid = foo(r. grantid, r. empid); 

s. internalid = internalid; 

s.key = loopkey++; 

t.addRecord(s); 
end loop; 

for each s in t loop order by key 

INSERT INTO grants VALUES (s.internalid, s.empid, s.gnum); 
end loop; 

for each s in t loop order by key 

total += s.gnum; 
end loop; 

Figure 4.3: Splitting a cursor loop 

If the sequence of statements ss in a loop is made up of two consecutive sub-sequences 
ssi, SS2 (i.e., ss = ss\ + SS2), and if there are no loop-carried flow dependencies from any 
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while (top > 0) loop 
top = top — 1; 
curcat = stack[top]; 

catitems = SELECT count (itemid) FROM item WHERE category =curcat; 
totalcount + = catitems; 



TABLE(key, curcat, catitems) t; 
int loopkey = 0; 
while(top > 0) loop 

RECORD(key, curcat, catitems) r; 

top = top — 1; 

curcat = stack[top]; 

r. curcat = curcat; 

r.key = loopkey++; 

t.addRecord(r); 
end loop; 

for each r by ref in t order by key loop 

t. catitems = SELECT count (itemid) FROM item WHERE category =t. curcat; 
end loop; 

for each r in t order by key loop 
totalcount + = t. catitems; 
end loop; 



statement in ss 2 to any statement in ssi or to the loop predicate, then the loop can be 
split such that ss\ and ss 2 appear in separate loops. 

Unlike cursor loops, the iteration space for general while loops cannot be known 
upfront [15] and is constructed dynamically. In general, the loop splitting transformation, 



for the case of while loops, can be expressed as Rule 2j. 

We call a variable as split variable if it is involved in a loop-carried anti or output 
dependency that crosses the split boundaries. While splitting a loop we introduce a table- 
valued variable, which has attributes corresponding to each of the split variables. Each 
loop in the original program, if required to be split, introduces exactly one table. The 
table essentially serves to break the loop-carried anti and output dependencies. We call 
the table associated with loop L in the original program as the L-table. Note that after 

1 In the interest of readability, the earlier examples in Figures 14.31 and 14.41 contained minor variations 
from the construction in Rule 2. 



end loop; 



Figure 4.4: Splitting a while loop 
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Rule 2 Splitting a WHILE Loop 



while p loop 

ssi; s; SS2; 
end loop; 

such that: 

(a) No loop-carried flow dependencies (i.e., LCFD edges, external or otherwise) cross the points 
before and after s. 

(b) No loop-carried external anti or output dependencies cross the points before and after s. 

t 

Table(T) t; 
int loopkey = 0; 
while p loop 

Record(T) r; ss'i, r.key=loopkey++; t.addRecord(r); 
end loop; 

for each r by ref in t order by t.key loop 
s' 

end loop; 

for each r by ref in t order by t.key loop 

ss r ; SS2 
end loop; 
delete t; 

where T, ss^, s' and constructed as follows. 

Let SV (split variables) be the set of variables for which either an LCAD or LCOD edge crosses 
the split boundaries (the edge is incident from SS2 to s or ssi, or from s to ssi). 

1. Table t and record r have attributes corresponding to each variable in SV and a key. 

2. ss[ is same as ssi but with additional assignment statements to attributes of r. Each 
write to a split variable v is followed by an assignment statement r.v = v ;. If the write 
is conditional, then the newly added statement is also conditional on the same guard 
variable. 

3. The statement s' is same as s, except that each reference v to a variable in set SV is 
replaced by r.v. Formally, s' = SUBS(s, SV, map : v — ► r.v) 

4. ss r is a statement sequence assigning attributes of r to corresponding variables. Each 
assignment in ss r is conditional; the assignment is made only if the attribute of r is 
non-null (assigned). 

Note: If the all operations in the loop are batch-safe we can omit the loopkey and the key 
attribute. 

Rule 2A: Splitting a Cursor Loop 

The rule for splitting cursor loops is a minor variant of Rule 2, and we omit the details 
for brevity. 



the split, the newly formed loops that iterate on the L-table must have an ORDER BY 
clause. The ORDER BY clause can then be eliminated using Rule ID if the statements 
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inside the loop are batch-safe. 

Although we succeed in splitting a loop by breaking loop-carried anti and output 
dependencies, loop-carried flow dependencies prohibit splitting of the loop (see precondi- 
tion (a) of Rule-2). For example, if a statement in SS2 wrote a variable whose value is 
read in ssi or by the loop predicate, then Rule 2 does not apply. However, later in this 
chapter we shall see that in some cases it is possible to reorder the statements within a 
loop so as avoid such loop-carried flow dependencies. All types of loop-carried external 
dependencies prohibit splitting the loop (see preconditions (a) and (b) of Rule-2). Recall 
that external dependencies are data dependencies through external resources like files. 
Rule 2 generalizes rule T4 of Lieuwen and DeWitt [31] . We compare our work with [31] 
in Section 14.71 

4.3.3 Separating Batch-Safe Operations (Rule 3) 

A program statement may contain the expression we want to batch in combination with 
other non batch-safe operations. In such a case, we isolate the batch-safe operation by 
introducing an extra variable. Figure 14.51 shows an example and Rule 3 expresses the 
transformation formally. 

Rule 3 Isolating Batch-Safe Expressions 

Let e be a batch-safe expression in statement stmt. Then, 

stmt; < £4 > T v = e; stmt'; 
where stmt' = SUBS(stmt,e,v) and T —type-of(e); 

Variable assignment is not batch-safe in general, e.g., assignment to a global variable. 
However, assignments to different locations, e.g., different rows of a cursor loop, can be 
performed in any order and hence batch-safe in the context of split variables that are 
converted to attributes of the L-table by Rule 2. If the return value of a query is assigned 
to a non-local variable, applying Rule 3 introduces a new split variable and thus enables 
batching the query. 

4.3.4 Control to Flow Dependencies (Rule 4) 

Conditional branching (if-then- else) and while loops lead to control dependencies. If the 
predicate evaluated at a conditional branching statement si determines whether or not 
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^ . . , ^ r (ii) After applying Rule-3 

(i) Original Code v , ; . . *T J & 

u-i , m , while (n > 0) loop 
while (n > 0) loop v . ' , 

- r__ 1- x = s t — n l ; 

// of) /s to be batched; printf) is not batch-safe 'J. et e ^ype-o (q(...JJ 

■ !t u T v = q(x): 

prmt(q(x)); . , , 

, , pnnt(v); 
end loop; . . 

end loop; 



(iii) After splitting the loop 

Table(...) t; int loopkey = 0; 
while (n > 0) loop 

Record (. . .) r; 

x = s[ n]; 

r.x = x; r.key = loopkey++; t.addRecord(r); 
end loop; 

// order-by removed with Rule ID 
for each r by ref in t loop 

r.v = q(r.x); 
end loop; 

// order-by is required 

for each r in t order by t.key loop 

print(r.v); 
end loop; 



Figure 4.5: Separating Batch-Safe Operations 



for each t by ref in sales loop 

/ / Using a control variable remember 
1 1 the branching decision 
boolean cv = (t.brcode == 58); 
(cv==true)? t.brcode = 1; 
(cv==true)? q(t.item, t.qty, 
t.brcode); 

end loop; 

Note: After this transformation, we can apply Rule-2 and split the loop. The conditional invo- 
cation of q can then be batched using Rule 1 C. 

Figure 4.6: Transforming Control-Dependencies to Flow- Dependencies 



for each t by ref in sales loop 
if (t.brcode == 58) 
t.brcode = 1; ^=4^ 
q(t.item, t.qty, t.brcode); 
end if 
end loop; 



control reaches statement s2, then s2 is said to be control dependent on si. During loop 
split, it may be necessary to convert the control dependencies into flow dependencies |29j . 
Figure 14.61 shows an example. Rule 4 specifies the transformation formally. 
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Rule 4 Converting control-dependencies to flow-dependencies 
if (p) { ssi } else { ss 2 } 

t 

boolean cv = p; 
ss 

where ss[i] = (cv == true)lssx[i], 1 < i < \ssi\ and 
ss[k + j] = (cv == /a/se)?ss 2 [j], 1 < j < \ss 2 \, k = \ss\\ 



4.3.5 Reordering Statements (Rule 5) 

Consider the example in Figure 14.71 Suppose we want to batch the query invocation 
q(category) in statement si. We cannot directly split the loop so as to batch si because 
there are loop-carried flow- dependencies from s3 to si and to the loop predicate, which 
violate pre-condition (a) of Rule 2. Statement s3, which appears after si, writes a value 
and statement si reads it in a subsequent iteration. We therefore reverse the order of 
statements si and s3 before splitting the loop (Figure \4.7\i . Intuitively, we first collect 
all the categories in the hierarchy and then perform a batched invocation of the query 
that computes the item counts for the categories. The basic rules that allow us to reorder 
statements are specified in Rule 5. To be able to split a loop so as to batch the desired 
statement, multiple applications of Rule 5 may be needed. It is important that Rule 5 
be applied in a carefully chosen sequence so as to achieve the desired reordering. We will 
return to this issue in Section I4.4[ where we give an algorithm to reorder statements such 
that the pre-conditions for Rule 2 are met. 

4.3.6 Batching Across Nested Loops (Rule 6) 

Loops in a program may be nested within other loops and form a hierarchy. The query 
or update operation we are interested in batching may lie anywhere in the loop hierarchy. 
It is often desirable to batch the query or update operation w.r.t. as many ancestor loops 
as possible. The aim here is to make fewest possible calls to the expensive operation, in 
other words, to make the size of the batch in each invocation as large as possible. 

Consider a loop L c nested under loop L p and a query q inside L c . When the child 
loop (L c ) is split using Rule 2, a TABLE valued local variable, L c -table, is introduced in 
the parent loop (L p ). With the application of Rule 1, q is pulled out of L c and is replaced 
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Original Program 
sO: while (category != null) loop 

si: int icount = q(category); // Query to batch 

s2: sum = sum + icount; 

s3: category = get Parent Category (category); 

end loop; 

t 

After Order Reversal 
sO: while (category != null) loop 
si': int category _stub = category; 
s3: category = get Parent Category (category); 

si: int icount = q(category_stub); 

s2: sum = sum + icount; 

end loop; 

t 

After Loop Split 

TABLE(...) r; 

int loopkey = 0; 

while (category != null) loop 

RECORD(...) t; 

int category _stub = category; 

t. category _stub = category _stub; 

category = get Parent Category (category); 

t.key = loopkey++; 

r.addRecord(t); 
end loop; 

for each t by ref in r loop 

t. icount = q(t. category _stub); 
end loop; 

for each t in r order by key loop 

sum = sum + t. icount; 
end loop; 

Figure 4.7: Reordering Statements to Satisfy Pre-Condition of Rule-2 

by qb that lies directly inside L p . In turn, when the parent loop is split, the L c -table 
(of the child loop) becomes a TABLE valued attribute (nested table) in the parent's L p - 
table. We now perform a second level batching of qb w.r.t. L p by unnesting the L p -table. 
Rule 6 enables this transformation. Intuitively, we first try to pull the statement out of 
the inner most loop enclosing it and then out of the next (higher) level loops. Figure ID.5I 
in Appendix [D] gives a complete example of batching a statement across nested loops. 

In Rule 6 we make use of the nest[y) and unnest(^) operators of nested relational 
algebra [H [3] . Below we give a brief description of these operators and refer to [8] for the 
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Rule 5 Basic Rules that Facilitate Reordering of Statements 
Rule 5A: Reordering Independent Statements 

Two statements can be reordered if there exists no dependency between them. 
Si, s 2 ; where indepfsi, s 2 ) -\=4> s 2 ,S\] 
Rule 5B: Shifting an Anti-Dependence Edge 

An ant i- dependence edge between two statements can be shifted by using an extra vari- 
able. 

si; s 2 ; 

i AD V 

where S\ > s 2 

t 

v' = v; s[; s 2 ; 

where s[ is constructed from s\ by replacing all reads of v by reads of v'. 
Rule 5C: Shifting an Output-Dependence Edge 

, OD v 

where si > s 2 

t 

Sl ; s ' 2 ;v = v 1 ; 

where s 2 is constructed from s 2 by replacing all writes of v by write to v '. 
formal definitions. 

Nest: The nest operator (z/g^ s (r)) groups the tuples of r on attributes schema(r) — S, 
then for each group forms a single tuple with a relation valued attribute s containing the 
S values of the tuples grouped together. 

Unnest: The unnest operator (/i s (r)), where s is a relation valued attribute of r, performs 
the inverse operation of nest. 

/ i «( r ) = Uter(nfl{£} x where R is the set of all attributes in schema(r) excluding s. 
Though we use a nested relational model for the L-tables, our techniques are easy to 
implement on any RDBMS by storing the nested tables separately. 



4.3.7 Correctness of Transformation Rules 



We give here a brief description of our approach for proving the correctness of the program 
transformation rules. Appendix [B] gives formal proofs of correctness. 
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Rule 6 Batching Across Nested Loops 



Let s be a table valued attribute of table r, and let S —schema(r.s). 

Rule 6A. No Return Value 
for each t in r loop 

qb((t.c u . . .,t.c n ) x t.s); 
end loop; 

t 

qb(U d A (ii s (r))) where A = {c u . . . , c n } U S 

Rule 6B: With Return Value 
for each t in r loop 

M Cl=c > 1 ,..., Cn=cn '(t.s,qb(t.s)) 
end loop; 

where qb is a pure function. 

t 

rs = /i s (r); 

M Cl=c > i) ... yCn=cn <(rs,qb(Tls(rs))); 
r = v s ^ s (rs); 



Let P L be a program fragment that matches the LHS of a rule and P R be the program 
fragment instantiated by the corresponding RHS. Let p be the program position at which 
Pl begins. Let (G, S) be the pair of any valid program and system states at p. The 
program state G comprises of values for all variables accessible at the program position p 
and the system state S comprises of the state of all external resources like database and 
file system. To prove the correctness of a transformation rule, we must show the following. 
If the execution of P L on (G, S) results in the state (G", S') then the execution of Pr on 
(G,S) will also result in the state (G',S'). Note that we assume intermediate program 
and system states are not observable. This is a valid assumption in many practical 
applications. The correctness of many rules directly follows from the definition of batch- 
safe operation and that of batched forms. In some cases, we need to show the multiset 
equivalence of the L-tables, which is being updated, at the end of the execution of Pl 
and Pr. The proof of correctness of Rule 2 uses an argument on the values seen by each 
statement in the i th iteration (1 < % < n), where n is the number of loop iterations. 
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4.4 Control Algorithm for Rule Application 



Rewriting a program for set-orientation involves the following steps: (i) identify itera- 
tively invoked query execution statements, (ii) decide whether it is beneficial to batch the 
query execution and the ancestor loop (in the hierarchy of loops enclosing the statement) 
with respect to which the statement must be batched and (Hi) rewrite the program by 
systematically applying the transformation rules presented in the previous section. 

procedure batch(Stmt s, Loop 1) 
Inputs: 

s: The query execution statement to be batched 

1: A program loop w.r.t. which the query must be batched [i.e., s must be pulled out of 1). 
s may be present directly inside 1 or within a descendant loop of 1. 

Note: We assume the above inputs are provided manually; cost-based decision 
is a future work. 

Goal: 

Rewrite the program to batch the query execution in statement s w.r.t. loop 1. 
begin 

Let lp be the loop which directly encloses s. 

// Batch s w.r.t. lp. Let the batched statement be sb. 

sb = do-batching(s, lp); 

if (lp != 1) 

Let lpp be the parent loop of lp 

batch(sb, lpp); 

end; 

procedure do-batching(Stmt s, Loop 1) 
begin 

Rewrite s using Rule-3 so that s is a simple assignment with only the query invocation 
expression on its RHS. 

If s is control dependent on any statement inside loop I (other than the loop predicate), 
convert the control-dependency to flow dependency (using Rule-4). 

Reorder the statements in 1 to satisfy pre-conditions for Rule-2 (making use of Rule 5). 
reorder(s, /.block); // Procedure reorder s described later (Figures [4.91 and 14.111) 

Applying Rule-2 split the loop 1 at the program points before and after s. 

Batch the query execution using Rule 1 or Rule 6. 

Return the reference to statement sb, the batched form of s. 

end; 

Figure 4.8: Control Algorithm for Rule Application 
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Identifying the query execution statements in a loop is usually straight forward. 
However, the decision on whether a statement should be batched and the level in the 
loop hierarchy up to which the statement must be batched, requires a cost-based analysis. 
Cost-based analysis is a future direction for our work and some of the parameters needed 
for cost-based analysis are discussed in Chapter [5j In this section, we assume these two 
inputs (the query invocation to be batched and the ancestor loop up to which the query 
must be batched) are available from the user. Given a query execution statement and 
a loop with respect to which the statement must be batched, the transformation rules 
presented in this chapter can be used to rewrite the program. However, it is important 
to apply the rules in a systematic way so as to achieve the goal of batching the given 
statement. 

In Figure 14.81 we give the procedure for applying the rules so as to batch a given 
query execution statement w.r.t. a given ancestor loop enclosing it. The procedure batch 
recursively pulls out the given statement, starting from the inner most loop enclosing 
it. Procedure do-batching performs the actual task of rewriting by applying the rules. 
First, we apply Rule-3 on the statement and ensure the RHS contains only the query 
execution expression. We then convert all the control-dependencies to flow-dependencies 
by applying Rule-4. This allows us to treat the entire body of the loop as a basic block 
(a straight-line sequence of statements with no branches into or out of the sequence). We 
perform a reordering of the statements (if needed) to satisfy the pre-conditions for Rule-2, 
and then use Rule 2 to split the loop at the program points which immediately precede 
and succeed the query execution statement. This leaves the query execution statement 
in a loop by itself - a form in which we can apply Rule-1 or Rule-6 and make use of the 
batched form. Rule-1 gets applied for the inner most loop enclosing the statement and 
Rule-6 gets applied for the higher level loops. 

Reordering Statements to Enable Loop Splitting 

As described in Section H.3.51 to facilitate splitting of a loop, it may be necessary to reorder 
the statements within a loop. A loop can be split using Rule-2 only if there are no loop- 
carried flow dependencies that cross the split boundaries. Section 14.3.51 gives an example 
and specifies the basic transformations (Rule 5) that enable reordering of statements. The 
basic transformations specified in Rule 5 must be applied in an appropriate sequence to 
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procedure reorder (BasicBlock b, Stmt s q ) 

II Goal: Reorder the statements within b, such that no LCFD edges cross 

// the program points immediately preceding and succeeding s q . 

II Assumption: s q does not lie on a true-dependence cycle in the subgraph 

// of the DDG induced by statements in b. 

begin 

while there exists an LCFD edge crossing the split boundaries for s q 
Pick an LCFD edge (v±,V2) crossing the split boundaries. 

if there exists a true-dependence path from v\ to s q 
/* No true-dependence path from s q to v\ */ 

stmtToMove = s q ; 

targetStmt = v\ ; 

else 

/* No true-dependence path from v\ to s q , which implies no true-dependence 
path from V2 to s q as there exists an LCFD edge from vi to v% */ 
stmtToMove = V2; 
targetStmt = s q ; 

II Move stmtToMove past the targetStmt 

Compute srcDeps, the set of all statements between stmtToMove and targetStmt, 
which have a flow dependence path from stmtToMove. 

while srcDeps is not empty 

Let v be the statement in srcDeps closest to targetStmt 
moveAfter(w, targetStmt); // see Figure T4.11I 

moveAiter(stmtToMove, targetStmt); 

end; 

Figure 4.9: Procedure reorder 

achieve the desired reordering of statements in a loop. We now give an algorithm to do so. 
The goal is to reorder the statements such that no loop-carried flow dependencies cross 
the desired split boundaries. We make use the following definition in the description to 
follow. 

Definition 4.1 A true- dependence path (or cycle) in a data dependence graph is a di- 
rected path (or cycle) where each edge represents either a flow- dependence (FD) or a 
loop- carried flow- dependence (LCFD). 

Note that a true-dependence path excludes anti, output, loop-carried anti and loop-carried 
output dependence edges. 

The algorithm reorder, shown in Figure 14.91 works as follows. For each loop-carried 
flow dependence edge that crosses the split boundaries (the two program points in the 
basic block that immediately precede and succeed the statement to batch), the algorithm 
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Case-1 Case-2 Case-3 Case-4 

Move s q past vl Move v2 past s q 

Figure 4.10: Cases for Reordering Statements 



decides the statement to move, and its target position. There are four cases to consider 
while deciding the statement to move and its target position. The cases are shown in 
Figure 14.101 The way in which we choose the stmtToMove and targetStmt ensures the 
following. If s q , the statement to be batched, does not lie on a true-dependency cycle, 
then there exists no true-dependence path from the stmtToMove to the targetStmt. We 
then compute two sets srcDeps and tgtDeps. srcDeps is the set of statements present 
between stmtToMove and targetStmt, and which have a path of flow-dependence edges 
from stmtToMove. tgtDeps is the set of statements present between stmtToMove and 
targetStmt, and from which there exists a path of flow- dependence edges to the target- 
Stmt. Note that the two sets will be disjoint. Each statement in srcDeps is then moved 
past the targetStmt using the moveAfter procedure. The procedure moveAfter (shown in 
Figure H.lip performs the required reordering by swapping pairs of adjacent statements. 
While doing so, the procedure resolves any anti and output dependencies by creating stub 
statements, which make use of temporary variables. 

Figures 14.121 through 14.141 show examples illustrating statement reordering. Fig- 
ure 14.151 shows the data dependence graph for the original and reordered code of Fig- 
ure 14.141 In Figure 14.151 for each flow-dependence (FD) edge from x to y, there exists 
a corresponding loop-carried anti-dependence (LCAD) edge from y to x but these edges 
are not shown. Similarly, AD and OD edges have corresponding LCFD and LCOD edges 
respectively, which are not shown. In this example, si is the statement containing the 
query to batch. The LCFD edge from s4 to si crosses the split boundary and hence si 
must be moved past s4. As can be seen in Figure I4.15[ after the reordering, no LCFD 
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procedure moveAfter(Stmt s, Stmt t) 
External variables used: 

List srcDeps, Stmt s q / / Constructed by procedure reorder in Figure l4~9l 
begin 

if s succeeds t in the basic block 
return; 

Stmt next = successor(s); 
do { 

if no flow/anti/output dependence edges between s and next 

/* Reorder the statements by applying Rule-5A */ 

swap s and next; 
else { 

for each OD v edge from s to next { / / OD v : output dependence on variable v 
/* Shift the OD edge by applying Rule-5C */ 
Replace writes to v in next by writes a new variable v'; 
Insert a new statement as' v that assigns v' to v immediately after next; 
move After (as' v , t); 

} 

for each AD V edge from s to next { / / AD V . anti-dependence on variable v 
/* Shift the AD edge by applying Rule-5B */ 

if there exists an AD V edge from s q to next / / Use a reader stub 

Insert a new statement as' v that assigns v to a new temp variable v' 

immediately before s; 

Replace all read references to v in s by v'; 
else // Use writer stub 

Replace write of v in next by write to a new temp var v'; 

Insert a new statement as v that assigns v' to v immediately after next; 

move After (as,;, t); 

} 

swap s and next; 

} 

lastStmt = next; 
if (lastStmt ! = t) 

next = successor(s); 

} 

\Nh\\e(lastStmt ! = t) ; 

end 



Figure 4.11: Procedure move After 



while(category != null) loop 

(si) icount = q(category); 

(s2) sum = sum + icount; 

(s3) category = getParent(category); 

end loop; 



while(category != null) loop 

(tsl) categoryl = category; 

(s3) category = getParent(category); 

(si) icount = q(categoryl); 

(s2) sum = sum + icount; 

end loop; 



Figure 4.12: Example-1 of Statement Reordering 



104 



while(top > ) loop 

(s6) top = top-1; 

(s7) curcat = stack[top]; 

(s8) catitems = q(curcat); 

(s9) totalcount = totalcount + catitems; 

(slO') stack, top = block(curcat, top); 

end loop; 



Move s8 past slO' 



while(top > ) loop 

(s6) top = top-1; 

(s7) curcat = stackftop]; 

(slO') stack, top = block(curcat, top); 

(s8) catitems = q(curcat); 

(s9) totalcount = totalcount + catitems; 

end loop; 



Figure 4.13: Example-2 of Statement Reordering (from the UDF in Example 14.11) 



while(pred(c)) loop 

(s2) cv2? a3,c = f(x); 
while(pred(c)) loop (ni) b2 = b; 

(si) cvl? a = q(b); (n2) b5 = b; 

(s2) cv2? a,c = f(x); Movc sl past s4 (s4) cv3? al,b = h(c); 

(s3) d = g(a, b); k (si) cvl? a = q(b5); 

(s4) cv3? a,b = h(c); (n3) cv2? a = a3; 

end loop; (s3) d = g(a, b2); 

(n4) cv3? a = al; 

end loop; 



Figure 4.14: Example-3 of Statement Reordering 




edges cross the split boundary. The LCAD edges crossing the split boundary do not pro- 
hibit splitting of the loop, because these will be removed by the introduction of the loop 
local table while splitting the loop. 
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Table(key, al, . . ., b5, cvl, . . ., cv3) t; 
int loopkey = 0; 
while(pred(c)) loop 

Record r; 
(s2) cv2? a3,c = f(x); 
(s2') cv2? r.a3 = a3; 
(ni) b2 = b; ( n r) r.b2 = b2; 
(n2) b5 = b; (n2') r.b5 = b5; 
(s4) cv3? al,b = h(c); (s4') cv3? r.al = al; 

r.key = loopkey++; 

t. add Record (r); 
end loop 

for each r by ref in t order by key loop 
(si) r.cvl? r.a = q(r.b5); 
end loop; 

for each r by ref in t order by key loop 
boolean cvl, cv2, cv3; 
int al, a3, b2; 

assigned(r.cvl)? cvl = r.cvl; 

assigned(r.b5)? b5 = r.b5; 

(n3) cv2? a = a3; 
(s3) d = g(a, b2); 
(n4) cv3? a = al; 
end loop; 

Figure 4.16: Loop Splitting Applied to Reordered Code in Figure I4~T41 



4.5 Applicability of Transformation Rules 

Our program transformation algorithm succeeds in rewriting fairly complex programs for 
batched bindings. However, it may not be always be possible to rewrite a program to 
batch the invocation of a specific operation. As an example, consider the program and its 
DDG shown in Figure 14.171 We can batch the query invocation in statement s2 but not 
the one in statement si. The query invocation in statement si lies on the true-dependence 

F D LF D 

cycle si > s4 > si and hence we cannot reorder the statements so as to satisfy the 

pre-conditions of Rule-2. Similarly, in the DDG of Figure 14.21 the query invocation in 
statement s8 is batchable, whereas the one in statement slO is not batchable. 

Flow dependencies that result from control-dependencies (Rule 4) must be taken 
into account while checking for the presence of a true-dependence cycle. For instance, 
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sO: while(eid ! = NULL) loop 
si: mgr =SELECT manager 

FROM emp WHERE empid=eid; 
s2: idx = SELECT perfindex FROM rating 

WHERE reviewer=mgr and reviewed=eid; 
s3: sumidx += idx; 
s4: eid = mgr; 



end loop; 




Figure 4.17: Cyclic True-Dependencies 

statement s2 in the code fragment of Figure 14.181 lies on true-dependence cycle after 

converting the control dependence of s2 on sO into a flow dependence. Intuitively, an 

operation cannot be batched if the execution of the operation in any iteration depends 

on the value it returned in a previous iteration. 

sO: while(x < n) loop 
si: y = y - 1; 

s2: x = q(y); 

end loop; 

Figure 4.18: True-Dependence Cycle Created Due to Control-Dependency 



Condition for Batching 

In this sub-section, we state and prove the condition under which procedure reorder 
can reorder the statements in a loop such that a given statement s q can be batched. 
Before presenting the theorem and its proof, we introduce some useful terms and prove a 
supporting lemma. 

Let b be the basic block of statements containing s q , the statement to be batched. 
Let C be the set of LCFD edges that cross the split boundaries of s q in the given basic 
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block b. Let T be the set of LCFD edges (vi,Vz) such that both v% and Vi succeed s q in 
b. For example, in Figure H~T9"} C = {(s 2 , Si), (s 3 , s q )} and T = {(s 3 , s 2 )}. 



C={(s2,sl), (s3, sq)j 
T={(s3,s2)} 

Figure 4.19: Example of Dependence Edges in the C and T Sets 
We now state and prove a lemma, which characterizes the behavior of procedure moveAfter. 

Lemma 4.2 Each call to procedure moveAfter(s, t), made from procedure reorder, satis- 
fies the following pre-conditions and post-conditions. 
Pre-conditions for procedure moveAfter(s, t) 

(R.l) t = targetStmt (the original target statement assigned in procedure reorder) 
(R.2) No true- dependence path from s to t exists. 

(R.3) No statement between s and t has a true- dependence path from s. 
Post-conditions for procedure moveAfter(s, t) 

(T.l) \srcDeps\ decreases by at least 1 if s was in srcDeps. 

(T.2) \C\ + \T\ does not increase when s ^ stmtToMove, and \C\ + |T| decreases by at 
least 1 when s = stmtToMove (here, stmtToMove is the original statement to be 
moved past the targetStmt, and is assigned in procedure reorder) 

(T.3) Program correctness is preserved. 

(T.4.) The procedure terminates, and s is moved past t. 

Proof: First, we prove that all the calls to procedure moveAfter, made from procedure 
reorder satisfy the three pre-conditions R.l, R.2 and R.3. We then prove that procedure 
moveAfter ensures the post-conditions whenever the pre-conditions hold. 
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(R.l) All calls made to procedure move After horn procedure reorder have the targetStmt 
as parameter t. All recursive calls within procedure moveAfter do not change 
parameter t. Hence, pre-condition R.l holds. 

(R.2) (i) If s q does not lie on a true-dependence cycle, then procedure reorder picks 
stmtToMove and targetStmt such that there exists no true-dependence path 
from stmtToMove to targetStmt. This can be observed in Figure H.101 

(ii) Each statement v in srcDeps has a true-dependence path from stmtToMove 
and hence there cannot be a true-dependence path from v to targetStmt (oth- 
erwise, stmtToMove would be on a true-dependence cycle). 

(Hi) Observations (i) and (ii) above prove that pre-condition R.2 holds for all calls 
to procedure moveAfter made from procedure reorder. 

(R.3) For all the calls to procedure moveAfter, which are made from procedure reorder, 
this condition is seen to hold from the way we pick statement v (passed as argument 
s for moveAfter) . 

Post-conditions 

We now show the following: If the pre-conditions for procedure moveAfter are met then, 

the post-conditions and pre-conditions for all recursive calls will be met. 

Case I: s ^ stmtToMove 

Case I A: t immediately follows s. 

IA.l No flow/ anti/ output dependencies exist from s to t. 

Procedure moveAfter swaps s and t. Post-condition (T.l) holds because s no longer 
lies between stmtToMove and t and is accordingly removed from srcDeps. Post- 
condition (T.2) holds because no changes happen to C and T (because s is a state- 
ment that lies after stmtToMove). The program correctness is preserved (Post- 
condition (T.3) holds) because no flow/anti/output dependencies exist between s 
and t; the do loop (and hence the procedure) terminates after the first iteration and 
s is moved past t (Post-condition (T.4) holds). 

IA.2 A single output- dependence edge (OD v ) exists from s to t. 

Transformations made by procedure moveAfter are shown pictorially in Figure 14.201 
We first note that the recursive call to moveAfter(as' v ,t) satisfies the pre-conditions 
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for moveAfter. as' v appears after t and hence the recursive call terminates. We then 
swap s and t. 

Post-conditions (T.l) and (T.2) hold for the same reasons stated for Case IA.l. 
Rewriting the LHS of t with v' makes s and t independent and hence the swap 
preserves correctness. The newly introduced copy-back statement, which assigns v' 
to v, ensures later statements see the correct value of v. Hence, post-condition (T.3) 
(correctness) holds. The do loop (and hence the procedure) terminates after the first 
iteration and s is moved past t (Post- condition (T.4) holds). 




Figure 4.20: Pictorial Depiction of Reordering with OD Edge 

IA.3 A single anti- dependence edge (AD V ) exists from s to t. 

Assume there exists an AD V edge from s q to next. Transformations that proce- 
dure moveAfter makes are shown pictorially in Figure 14.211 The newly introduced 
statement as' v , which preserves the value of v in v', cannot be part of srcDeps for 
the following reasons. Assume as' v has a true-dependence path from stmtToMove, 
which in this case must be s q . This implies either s q or some statement having a 
true-dependence path from s q writes v. Let this statement be p. This implies a 
true-dependence cycle involving s q and p since s q reads v and p writes v. Since it is 
given that s q does not lie on a true-dependence cycle, we conclude that as' v cannot 
have a true-dependence path from s q and hence not part of srcDeps. Now, it is 
straight-forward to see that the post conditions hold. 

Assume there exists no AD V edge from s q to next. Transformations that procedure 
moveAfter makes are shown pictorially in Figure 14.221 It is straight-forward to see 
that the pre-conditions for the recursive call hold and that the subsequent swap of 
s and next will be correct and the transformation satisfies the post-conditions. 
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Figure 4.21: Pictorial Depiction of Reordering with AD Edge (R-Stub) 




Figure 4.22: Pictorial Depiction of Reordering with AD Edge (W-Stub) 

IA.4 A flow- dependence edge exists from s to t. 

This case is ruled out from pre-condition R.2. 

IA.5 Multiple AD and OD edges exist from s to t. 

Each of the AD and OD edges can be treated independently as is done in procedure 
moveAfter. And we can see that the post-conditions continue to hold. 

Case IB: t does not immediately follow s (there exist other statements between s and t). 

From pre-condition R.3, the only possible dependencies between s and any statement 
u between s and t are AD and OD. We can keep swapping s and its successor till s is 
moved past t. Each swap preserves the pre and post conditions as argued above. 

Case II: s = stmtToMove 

Case IIA: t immediately follows s. 



IIA.l No flow /anti/ output dependencies exist from s to t. 

Procedure moveAfter swaps s and t. Let k be the number of LCFD edges incident 

111 



on t. Now, the change in the number of edges in sets C and T is as follows (see 
also Figure 14.101) . 

If stmtToMove=s q : \C new \ < \C otd \ + k-l and \T new \ < \T M \ - k 
If stmtToMove ^ s q : \C new \ < |C oW | - k and \T new \ < |T oW | + k - 1 
In both the above cases, |C| + |T| decreases by at least 1. Hence we can see that post- 
condition (T.2) holds. Post- condition (T.l) is satisfied since stmtToMove is not in 
srcDeps. The swap preserves the correctness since there are no flow/anti/output 
dependencies from s to t (post-condition (T.3) holds). Post-condition (T.4) holds 
as argued in IA.l. 

IIA.2 A single output- dependence edge (OD v ) exists from s to t. 

The resulting dependencies after the transformation are pictorially shown in Fig- 
ure S2UJ It can be seen that, \C\ + |T| decreases after the transformation. 

IIA.3 A single anti- dependence edge (AD V ) exists from s to t. 

The AD edge is moved using a reader stub if s = s q and is moved using a writer side 
stub otherwise. The dependencies after this reordering are shown in Figures 14.211 
and 14.22] Again, it can be seen that \C\ + \T\ decreases after the transformation. 

IIA.4 A flow- dependence edge exists from s to t. 

This case is ruled out from pre-condition R.2. 

IIA.5 Multiple AD and OD edges exist from s to t. 

Each of the AD and OD edges can be treated independently as is done in procedure 
moveAfter. And we can see that the post-conditions continue to hold. 

Case IIB: t does not immediately follow s (there exist other statements between s and t). 

This case is similar to Case IB, except that when s is eventually moved past t, \C\ + |T| 
strictly decreases. Hence the post-conditions hold. 

This completes the proof of Lemma I4.2L □ 
We now state and prove our main theorem. 

Theorem 4.3 Given a basic block of code b and statement s q in b such that s q does not 
lie on a true- dependence cycle in the DDG, procedure reorder terminates, reordering the 
statements of b such that: 
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(a) No LCFD edges cross the program points that immediately precede and succeed s q . 

(b) Program correctness is preserved (i.e., the reordered block is equivalent to the original) 

Proof: First, we prove that procedure reorder terminates. 

1. The inner while loop of procedure reorder executes until the set srcDeps is empty. 
In each iteration, a call to procedure moveAfter is made, with an element of srcDeps 
being passed as the parameter s. From the post-condition (T.l) of Lemma [4.21 we 
know that, after each call to moveAfter, the size of srcDeps decreases by at least 1. 
Hence, the inner while loop of procedure reorder must terminate. 

2. The outer while loop of procedure reorder executes until \C\ = 0. In each iteration 
of the outer while loop, the last call to procedure moveAfter is made with s being set 
to stmtToMove. Hence, from the post-condition (T.2) of Lemma fl~2l we know that 
\C\ + |T| decreases by at least 1 at the end of each iteration of the outer while loop. 
None of the other calls to the procedure moveAfter cause any increase in \C\ + |T| 
(follows from the post-condition (T.2) of Lemma l4.2p . Hence, the outer while of 
procedure reorder must terminate. 

3. Each call to procedure moveAfter, made from procedure reorder, terminates (post- 
condition (T.4) in Lemma [4. 2p . 

From 1, 2 and 3 above, it is straight-forward to observe that procedure reorder terminates. 

Any reordering of statements within the basic block b are performed by procedure 
moveAfter and not directly by procedure reorder. Hence, the post-condition (T.3) of 
Lemma 14.21 suffices to prove that procedure reorder preserves program correctness. 

When procedure reorder terminates, no LCFD edges cross the program points that 
immediately precede and succeed s q . This directly follows from the termination condition, 
\C\ = 0, of the outer while loop of procedure reorder. 
This completes the proof of Theorem 14.31 □ 

4.6 Experimental Results 

Our rewrite rules can conceptually be used with any language. However, to implement 
the rules we need to perform dataflow analysis of a program and build the data depen- 
dence graph. For our implementation, we chose Java, since tools for its dataflow analysis 

113 



are available in public domain. Our implementation uses the SOOT optimization frame- 
work [52J. SOOT uses an intermediate code representation called Jimple and provides 
dependency information on Jimple statements. Our implementation transforms the Jim- 
ple code using the dependence information. Finally, the Jimple code is translated back 
into a Java program. 

Our current implementation requires that queries and updates be performed using 
our API layer built on top of JDBC During rewrite we recognize these calls and transform 
them for batched bindings when possible. We have not yet implemented query rewriting 
to get batched forms and this step is done manually. The techniques for deriving batched 
forms of queries are well known and we expect the implementation to be straight-forward. 
Tables (batches) used in the rewritten procedures are constructed in-memory and trans- 
ferred to the database before evaluating the batched queries. Nest/unnest and merge 
operations are performed on these in-memory tables. 

There are no benchmarks for procedural SQL that we could use for our experiments. 
However, we had seen three real-world applications which were facing serious performance 
problems due to non-set-oriented execution, which were affecting their usability. We use 
these scenarios for our experiments. Our current implementation does not support all the 
transformation rules presented in this chapter. Hence, in some cases part of the rewriting 
was performed manually in accordance to the transformation rules. We do not have access 
to the actual data used in these applications and hence we used synthetic data. In one 
case we used TPC-H data as it matched the scenario. As we cannot report timings on 
the actual application code, we used independent programs having only the code required 
for the specific scenarios. The experiments were performed on a widely used commercial 
database system (we call it SYS1) running on an Intel P4 (HT) PC with 1GB of RAM. 

Experiment 1: Traversal of Category Hierarchy 

For this experiment, we used a program, which is a slight variant of the UDF in Exam- 
ple 14.11 The program finds the item (part) with maximum size under a given category 
(including all its sub-categories) by performing a DFS of the category hierarchy. For each 
node (category) visited, the program queries the item table. The TPC-H part table, aug- 
mented with a new column category-id and populated with 2 million rows, was used as 
the item table. The category table had 1000 rows - 900 leaf level, 90 middle level and 10 
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Figure 4.23: Performance Results for Experiment 1 



top level categories (approximately). A clustering index was present on the category-id 
column of the category table and a secondary index was present on the category-id column 
of the item table. All relevant statistics were built. Figure 14.231 shows the performance of 
the program before and after rewrite. 

For the non-batched query on the item table, SYSl's default choice was to use the 
secondary index. This plan results in a lot of random 10, and we found an alternative plan, 
which performs a sequential scan takes less time since the entire relation is brought into 
memory on the first invocation, and there is no IO on subsequent invocations. Since this 
plan was found to be cheaper, we enforced it using optimizer hints. Figure 14.231 compares 
the time taken by the best plan for the original program with the batched version. The 
batched version, in this case, performed a group-by followed by a join, whereas the original 
program repeatedly executed a query that performed selection followed by group by; as 
a result the batched version showed much better performance. In this experiment, the 
performance of the batched form was almost independent of the batch size because the 
group-by query computed the results for all the parameters in each case. 

In transforming the program, Rule-5 (reordering), Rule-2 (loop splitting) and Rule-1 
(batching) were applied in that order. There was a 12.5% increase in the program size 
(lines of code) due to the transformation. 
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Figure 4.24: Performance Results for Experiment 2 



Experiment 2: ESOP Management Application 

For this experiment, an application used for managing stock option grants of multiple 
organizations was considered. During each upload operation, a large number of records 
from a delimited file were processed. The application performed a mix of queries, inserts 
and updates. A brief outline of the program logic is given below to indicate the complexity 
of control-flow involved. 

For each record read from the input file, the program performs validation and pre- 
processing of the fields and then queries the options table to check if a record for the 
person already exists. The query predicate is parameterized on the values read from the 
input record. If a record is present, the old values of the various fields and the internal- 
emp-id are obtained as part of the same query. Further, the contactinfo table is queried 
using the internal- emp-id to obtain contact info fields. If the input record being processed 
has empty values for any of the fields, the old values (when present) are copied to those 
fields. Finally, new records are inserted or existing records updated in both options and 
contactinfo tables. Figure IE.2I in Appendix [E] shows the procedure. 

The rewritten program used an outer join in place of iterative selections, and per- 
formed batched updates and inserts. Figure I4.24I compares the performance of the rewrit- 
ten program with the original program for varying number of input records. 

In transforming the program, Rule-4 (control-dependencies to flow-dependencies), 
Rule-5 (reordering), Rule-2 (loop splitting) and Rule-1 (batching) were applied. After 

116 



transformation there was a 17% increase in the program size. 
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Figure 4.25: Performance Results for Experiment 3 

In this application, data about forms issued to various agents would arrive in the for- 
mat (agent-id, start-form-number, end-form-number). The program (shown in Figure [EJ] 
of Appendix [E]) would iterate over all the form issue records, expand the issue range and 
populate the forms-master table with entries corresponding to each individual form. The 
purpose was to be able to update and track the status of each individual form subsequent 
to its issue. The original program had an outer loop iterating over the form issue records 
and an inner loop iterating over the range (start-form-number, end-form-number). An 
INSERT operation was performed inside the inner loop. The transformed program could 
pull the insert operation out of both the loops and perform a batched insert. The running 
times of the original and transformed program are shown in Figure I4.25L The batched 
version performs much better for large batch sizes. For small batch sizes (less than IK) 
the computational overheads due to batch creation and nest/unnest operations cause the 
batched version to perform marginally slower than the original program. 

In transforming the program, Rule-5 (reordering), Rule-2 (loop splitting), Rule-1 
and Rule-6 (batching) were applied. The increase in program size was 16.5%. 
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Time Taken for Program Transformation 

Although the time taken for program transformation is usually not a concern (as it is 
a one-time activity), we note that, in our experiments the program transformation took 
very little time (less than a second). 

4.7 Related Work 

Queries such as those shown in Example 1 1 . 3 1 and Example 14. II can be thought of as nested 
queries with complex inner blocks. The inner block in such cases contains subqueries 
embedded in procedural code. Known decorrelation techniques such as [3TJ ESI SSI EZl EJ 
IT2] cannot be used to unnest such queries. The techniques proposed in this chapter make 
it possible to rewrite a procedure so as to enable set-oriented evaluation of the embedded 
sub-queries through batched bindings. This is an essential step in decorrelation [IS]. 
Further optimizations, such as pipelining the output of the expression that produces 
the parameter batch into the expression that consumes it, are possible and should be 
considered for future work. Graefe [22] highlights the benefits of batched bindings for 
speeding up index nested loops joins. Batched bindings not only help in performing 
10 efficiently, but can also make it possible to employ a set-oriented strategy at the 
operator level. Magic sets decorrelation [IS] employs parameter batches for decorrelation 
of nested queries. Our techniques can be thought of as extending this approach for 
complex procedures. Our work is a step towards combining query optimization with 
program analysis and transformation techniques; we believe this combination will give 
significant benefits for database applications. 

Lieuwen and DeWitt [31] consider the problem of optimizing set iteration loops in 
database programming languages. Their techniques can convert nested set iteration loops 
into joins. However, their work does not address the issue of batching procedure calls. 
The program transformation rules in this chapter can work with complex control-flow 
including if-then- else and while loops. Rule-2 in this chapter is a more general version 
of Rule T4 in [31]. An earlier work similar to [31] is [50] • It describes a programming 
language called Theseus for manipulating relational databases, and proposes optimiza- 
tion transformations, some of which are special forms of those presented in [34J. To 
the best of our knowledge, reordering of statements in a loop to facilitate batched in- 
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vocations of queries is not considered before. The works of Katz and Wong [27J, and 
Demo and Kundu [TU] address the problem of converting programs which make use of 
CODASYL FIND-statements into equivalent programs using relational queries. Similar 
to our work, their solution employs dataflow analysis. The aim of dataflow analysis in 
their work is to find dependencies between CODASYL FIND-statements so as to group 
the statements that access the same logically definable set of records. Fegaras [2] ad- 
dresses query optimization in object oriented databases, in the presence of object identity 
and in-place updates. Poulovassilis and Small [13] consider algebraic optimization of 
declarative, functional database programming languages. Their work considers a compu- 
tationally complete functional language and addresses issues such as termination, infinite 
data structures, but does not consider imperative language constructs and side-effects. 
Ceri and Widom [I] address the problem of deriving production rules for incremental 
maintenance of materialized views. The generated rules are set-oriented in the sense that 
they can process a set of changes performed on the base tables at once. In their work, 
the view definitions are assumed to be in SQL without any procedural constructs. 

Some of the program transformation techniques we employ are derived from those 
proposed in the area of parallelizing compilers [221 SSI [2]- However, the problem of 
batching differs from the problem of parallelizing in the following ways: (i) presence 
of flow-dependencies (described in Section 14.21) does not allow parallelization. However, 
batching is possible even if the order of two operations cannot be changed due to flow- 
dependencies, and (ii) as the aim of batching is to improve the performance of expensive 
10 bound queries and other database operations, it may be acceptable for the transfor- 
mations to introduce additional CPU operations or use extra memory to make batching 
possible. However, such approaches do not generally yield significant benefits in the con- 
text of parallelizing the instructions and are not considered to the best of our knowledge. 



4.8 Summary 

In this chapter we considered parameter batching as a means to improve performance 
of iteratively invoked database procedures and user-defined functions. We presented a 
technique, based on program analysis and transformation, to automate the generation of 
batched forms of procedures and to replace calls to stored procedures within imperative 
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program loops with a call to the batched form. Our implementation and performance 
study show the practicality and usefulness of the the proposed techniques. 

Procedural extensions to SQL offer new challenges and opportunities for query op- 
timization. To deal with these challenges query optimization must be augmented with 
program analysis and transformation techniques. Our work is a step towards combining 
query optimization with program analysis and transformation techniques; we believe this 
combination will give significant benefits for database applications. 
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Chapter 5 

Conclusions and Future Work 



In this thesis we looked at ways to speed up iterative execution of queries, user-defined 
functions and stored procedures. Iterative execution can happen either because known 
decorrelation techniques do not apply, for instance when the nested block is a complex 
procedure, or because an iterative plan has the least expected cost amongst the alterna- 
tives available to the query optimizer. We took a two pronged approach to address the 
problem. First, we looked at exploiting ordered parameter bindings to speed up itera- 
tive plans and presented extensions to a cost-based optimizer for choosing efficient sort 
orders. Next, we showed how to achieve set-oriented execution of queries and updates 
within complex procedures through parameter batching enabled by program rewrite. 

In Chapter [2] we showed how state retention of operators allows us to exploit sorted 
parameter bindings to improve the efficiency of iterative query plans. We then showed 
how a Volcano style cost-based query optimizer can be extended to take into account 
state retention and effects of sorted parameter bindings. An important problem, which 
arises in optimizing both nested queries as well as queries containing joins, grouping and 
other set operations is that of deciding the optimal sort order of parameters and inputs. 
We addressed this problem in Chapter [3j We showed that even a simplified version of 
the problem of choosing optimal sort orders is NP-Hard and gave principled heuristics, 
which take into account partially available sort orders and attributes common to multiple 
join predicates. We presented experimental results, carried out by forcing plans gener- 
ated by our optimizer extensions on widely used database systems. The results showed 
significant improvements in actual query execution time due to the proposed techniques 
when compared to the default plans chosen by the respective systems. Although it is dif- 
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ficult to quantify the percentage of real-world queries that can benefit from the proposed 
techniques, our experiments carried on queries taken from the TPC-H benchmark and a 
financial application, and our experience with ETL tasks in data warehousing indicate 
that a significant number of queries can benefit from the proposed techniques. 

In Chapter H] we considered iterative execution involving complex user-defined func- 
tions and stored procedures, which use a mix of procedural constructs and SQL. We 
proposed a program analysis and transformation based approach to enable set-oriented 
execution of queries and updates inside such procedures. The approach consists of a set of 
program transformation rules, which are used to (i) automatically generate the batched 
form of a given procedure and (ii) replace iterative calls to the procedure with code to 
construct a parameter batch and invoke the batched form. Our program transformation 
rules can deal with a rich set of language constructs such as looping, conditional control 
transfer and assignments. With the help of our implementation of the rewrite rules for a 
subset of Java, we carried out an experimental study on cases chosen from three real-world 
applications. The results are very promising, and show up to 75% improvement in the 
actual execution time. 

With increasing use of procedural extensions to SQL and emergence of language 
integrated querying paradigms (e.g., Microsoft LINQ [33]), combined optimization of 
application code and database queries/updates becomes more and more important for 
improving the application performance. Our work is a step towards this goal. 

The proposed program rewrite techniques are useful in two broad scenarios: (a) to 
transform programmer written loops with database access into potentially more efficient 
code and (b) to automatically generate batch processing routines from routines that were 
developed for one at a time request processing. There is a large body of code in languages 
like PL/SQL and Java, in the former category. For our experiments we considered two 
such examples (Experiment 1 and 3 in Chapter S]) from real- world applications^- Although 
it is hard to quantify the number of cases which have the second requirement, we have 
come across two cases in which many stored procedures were required to be rewritten for 
batch processing. One such example is considered in Experiment 2 of Chapter 4. 

Whether batching a specific query invocation yields significant benefits depends on 
the number of iterations, data characteristics and the physical design of the database. 

^^Due to confidentiality reasons we cannot name the applications and organizations. 
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Applying the program transformation rules presented in this thesis in a cost-based manner 
is a future work. 

Future Work 

There are several interesting challenges ahead to fully realize the goal of optimizing itera- 
tive invocation of complex procedural blocks and iterative invocation of queries/updates 
from application code. We briefly discuss some directions for future work. 

Sort Orders from Secondary Indices 

In Chapter [3J, while computing the set of approximate favorable orders on base relations, 
we consider only the clustering index and secondary indices that cover the given query. 
Although it is possible to use non query covering secondary indices to get ordered tuples 
(by traversing the index leaf pages), it is usually very inefficient due to random 10. 
However, if the actual tuple fetch can be deferred until a point where the extra attributes 
are needed in the query plan, it is possible that the approach can perform better. If 
a highly selective filter discards many rows before the extra attributes are needed, only 
a few tuple fetches happen. Evaluating such alternatives in a cost-based manner is an 
interesting future work. 

Cost-Based Choice of Queries to Batch 

Our current implementation requires manual input on which operations to consider for 
batching. An interesting and important problem is to make a cost-based decision on which 
queries/updates to batch. The important parameters on which this decision depends 
include (i) cost model for the operation as a function of batch size, (ii) expected number 
of iterations of the program loop (Hi) branch probabilities for the branching statements 
(if-then- else) in the program and (iv) overheads of the transformed code. 

Pipelined Execution of Queries Inside a UDF 

The L-tables, discussed in Section I4.3[ serve to hold the parameter batch with which the 
batched form of a procedure is invoked. Though small batches can be held in memory, 
in general we may need to materialize the batches and the cost of materialization must 
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be taken into account while deciding to batch an operation. However, for procedures 
that run entirely inside the database engine (e.g., UDFs) it may be possible to avoid 
materialization of batches by constructing a single dataflow containing both relational 
and procedural nodes. Our loop splitting transformation is designed to facilitate such an 
approach. Appendix O contains a few additional rules that can be used for (i) avoiding 
creation of intermediate batches by passing a relational expression, instead of a table, 
to the batched forms and (ii) mapping program statements that perform simple and 
inexpensive operations (e.g., expression evaluation and variable assignment) to operators 
that work on sets. The later of these helps in eliminating loops such as the one left over 
in Example 14.41 When a single dataflow is thus built, code that cannot be mapped to 
relational operators is executed by procedural nodes in the dataflow. Variable bindings 
inside such nodes are obtained from input tuples. 

Set-Oriented Evaluation in the Presence of Cyclic Flow-Dependencies 

The program transformation rules presented in Chapter H] and the rule application algo- 
rithm guarantee that a query execution statement s can be batched w.r.t. a loop if s 
does not lie on a true-dependence cycle in the DDG (Theorem 14.31) . Intuitively, the only 
statements we fail to batch are the statements whose execution in an iteration of the loop 
depends on the result of their own execution in a previous iteration. In such cases, the 
set of parameters for a query execution statement cannot be computed up front (without 
invoking the query itself). 

However, in certain cases, it may be possible to identify a superset of the parameter 
batch even if the query execution statement lies on a true-dependence cycle. For example, 
consider the iterative execution of a scalar aggregate query with a parameterized equality 
predicate in its where clause. An example of such a query would be: SELECT sum(balance) 
FROM account WHERE branch=:branch. If a true-dependence cycle makes it impossible 
to compute the exact set of parameters (in this example, branches) for the query, one 
can compute the query results for all the possible parameters by using a vector aggregate 
query, which for our example would be: SELECT sum(balance) FROM account GROUP 
BY branch. Note that such an approach can be used only for pure functions and not for 
operations with side-effects (even if the operation is batch-safe). 
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Appendix A 

Optimality with Minimal Favorable 
Orders 

The notion of minimal favorable orders, introduced in Section 13.31 served as the basis 
for our heuristics for selecting sort orders. Since it is hard to compute the exact set of 
minimal favorable orders, we used a heuristic approach to compute them approximately. 
However, it is interesting to study the properties of minimal favorable orders. In this 
section we give a proof of Theorem 13.61 stated earlier in Section 13.3.21 The theorem es- 
sentially states the following: to identify an optimal sort order, it is sufficient to consider 
only the minimal favorable orders and not the full set of favorable orders. Below, we 
repeat the formal statement of the theorem and present a proof. The proof makes use of 
notation introduced in Sections 13.1.11 and 13.3.11 

Theorem 3 The set I(e, o) computed with exact ford-min contains an optimal sort order 
o p for the optimization goal e = (ei M e r ) with (o) as the required output sort order. 

We prove Theorem 13.61 under the following assumption: If 0\, 02 are two sort orders on 
the same set of attributes {i.e., attrs(oi) = attrs(o2)), then the CPU cost of sorting the 
result of an expression e to obtain 0\ will be same as that for o 2 , i.e., cpu-cost(e, 0\) =cpu- 
cost(e, o 2 ). 

Proof: Consider the optimization goal for a join expression (e = e\ n e r , with (o) as the 
sort order required on the result of e. Let S be the set of join attributes and o' be any 
sort order on 5*. The cost of the best merge-join plan for e, when d is chosen as the sort 
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order for ei, e r , is given by: 

PC(e,o,d) = cbp(ei,d) + cbp(e r ,d) + coe(e,d ,o) + CM(ei,e r ), 

where CM(ei, e r ) is the cost of merging. (A.L 



In Equation IA.lt we note that CM(e\, e r ) is independent of the sort order d . 

Let o b be an optimal sort order for el n e r . Assume o b ^ 2T(e). We show that 3o p G 1(e) 
such that PC(o p ) =PC(o b ). 

Case 1: Suppose o b is such that o& ^ ford(ei) U ford(e r ). 

PC(e, o, o b ) = cbp(ei, o b ) + cbp(e r , o b ) + coe(e, o b) o) + 

CM(e h e r ) (A.2) 
Since, o& ^ ford(ei) U ford(e r ) we can write 
= c6p(e ; , e) + coe(e h e, o b ) + c6p(e r , e) + 

coe(e r , e, o&) + coe(e, o&, o) + CM(ei, e r ) (A. 3) 

Let o p be a sort order in 1(e) such that o A S < o p , where o is the required output 
sort order in the optimization goal. The existence of such a sort order in 1(e) directly 
follows from the construction of 1(e), specifically, steps 1 and 2 in Section T3.3.2I 

Since both o b and o p are sort orders on the same attribute set S, we have 

coe(ei, e, o b ) = coe(ei, e, o p ) and coe(e r , e, o b ) = coe(e r , e, o p ) (A. 4) 

Substituting Equation IA.4I in Equation IA. 31 we get: 

PC(e,o,o b ) = cbp(ei,e) + coe(ei,e,o p ) + cbp(e r ,e) + 

coe(e r , e, o p ) + coe(e, o b , o) + CM(e;, e r ) (A. 5) 
> cbp(e h o p ) + cbp(e r , o p ) + coe(e, o b , o) + 

CM(e h e r ) (A.6) 

As (oAS) < o p , we have (o b Ao) < (o p Ao) (because o b is a permutation of S). Therefore, 
coe(e,o b ,o) > coe(e,o p ,o). From this, we can rewrite Equation I A. 61 as: 

PC(e,o,o b ) > cbp(ei,o p ) + cbp(e r ,o p ) + coe(e,o p ,o) + CM(ei,e r ) 
> PC(e,o,o p ). 
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By assumption o& is an optimal sort order. So we conclude PC(e,o,Ob) = PC(e,o,o p ). 
In other words, X(e) contains a sort order o p having the same plan cost as the optimal 
sort order Ob- 

Case 2: Suppose Ob is such that Ob G ford(ei) or ford(e r ) but not both. 

Without loss of generality we assume o fe G ford(ei). This implies one of the following: 

(i) 3o' G ford-min(ei) such that Of, < o' and cbp(ei,Ob) = cbp(ei,o') or 

(ii) 3o' G ford-min(ei) such that o' < and cbp(ei, d) + coe(e;, o', o^) = cbp(ei, Ob). 

We now consider, each of these cases separately. 

Case 2- A: Suppose condition (i), repeated below as Equation IA.7[ holds. 

3o' G ford-min(ei) such that Ob < o' and cbp(ei, Ob) = cbp(ei, o') (A. 7) 

o' G ford-min(ei) implies (o' A S) G ford-min(ei, S). Therefore, from the construction of 
set X(e), we know: 

3o p G J(e) such that (o A S) < o p (A.8) 

Since Ob < o', we know (o?, A 5) < (o' A S) (A. 9) 

Substituting Equation IA.9I in Equation IA.81 we get (o& AS) < o p . Since both and o p 
are permutations of the same attribute set S, we must have Ob = o p . i.e., the optimal sort 
order Ob must be in 1(e). 

Case 2-B: Suppose condition (ii), repeated below as Equation IA. 101 holds. 

3o' G ford-min(ei) such that o < Ob and cbp(ei, o) + coe(ej, o', o?,) = cbp(ei, Ob) (A. 10) 

The plan cost for e, with Ob as as the chosen sort order, is given by: 

PC(e,o b ) = cbp(e h o b ) + cbp(e ri o b ) + coe(e 1 o b) o) + CM(e h e r ) 

Substituting for cbp(ei,Ob) from Equation IA. 101 we get 
= cbp(ei, o') + coe(ei, o', o b ) + cbp(e r , o b ) + coe(e, o b , o) + CM(ei, e r ) (A. 11) 

o' G ford-min(ei) implies 3o p G X(e) such that (o' A S) < o p . Since d < Ob, we know 
attrs(o') C S. Therefore, we have d A S = d . And hence, d < o p . Also, since both o p 
and Ob are permutations of S, we have |o&| = \o p \. 
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Since, o& ^ ford(e r ), we have cbp(e r , o&) = cbp(e r , o p ). Substituting this in Equation IA.11} 
we get: 

PC(e, o b ) = cbp(e h d) + coe(ei, o' , o b ) + cbp(e ri o p ) + coe(e, o b) o) + CM(e/, e r ) (A.12) 

Since o' < and o' < o p and |o(,| = |o p | we can write Equat ion I A . 1 2 1 as : 

PC(e, Ob) = cbp(ei, o') + coe(ei, o', o p ) + cbp(e r , o p ) + coe(e, Ob, o) + CM(ei, e r ) 

> cbp(ei, o p ) + cbp(e r , o p ) + coe(e, o b , o) + CM(e h e r ) (A.13) 

Now, we show that coe(e, o b , o) > coe(e, o p , o) to complete the proof. 
Case (a): Suppose, d < o. 

Since 1(e) contains a sort order which subsumes, OAS', it is possible to choose o p fromX(e) 
such that (o A S) < o p . This implies, \ob A o\ < \o p A o\. Hence, coe(e, Ob, o) > coe(e, o p , o). 
Substituting this in Equation IA. 131 we get: 

PC(e, Ob) > cbp(ei, o p ) + cbp(e r , o p ) + coe(e, o p , o) + CM(ei, e r ) 

> PC(e,o p ) 
Case (b): Suppose, d ^ o. 

Now, d A o = Ob A o = o p A o (because d < Ob and d < o p ). Therefore, coe(e, Ob, o) = 
coe(e,o p ,o). Substituting this in Equation IA. 13^ we get: 

PC(e, o b ) > cbp(e h o p ) + cbp(e r , o p ) + coe(e, o p , o) + CM(e h e r ) 

> PC(e } o p ) 

Case 3: Suppose Ob is present in both ford(ei) and ford(e r ) 
This implies one of the following: 

(i) 3o' G ford-min(ei) U ford-min(e r ) such that Ob < d . In this case the proof can 
proceed as in Case 2-A. 

(ii) 3oi G ford-min(ei) and 3o2 G ford-min(e r ) such that (a) o\ < o b and 02 < o b and (b) 
cbp(ei, oi) + coe(ei, 01, o b ) = cbp(ei, o b ) and (c) cbp(e ri o 2 ) + coe(e r , o 2 , o b ) = cbp(e ri o b ). 

Since 0\ < Ob and 02 < o b , either o\ < 02 or 02 < o\. Hence, 3o p G X(e) such 
that 01 < Op and o 2 < o p . Choosing such an o p , and proceeding as in Case 2-B we 
can prove PC(e,o b ) > PC(e,o p ) 

This completes the proof of Theorem 13.61 □ 
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Appendix B 

Correctness of Transformation Rules 



In this section, we give a formal proof of correctness of all the program transformation 
rules presented in Chapter HI 

The program state G comprises of values for all variables accessible at a program 
position p and the system state S comprises of the state of all external resources like 
database and file system. Let Pl be a program fragment that matches the LHS of a 
rule and Pr be the program fragment instantiated by the corresponding RHS. Let p be 
the position in the program at which Pl begins. Let (G, S) be the pair of any valid 
program and system states at p. To prove the correctness of a transformation rule, we 
must show the following. If the execution of Pl on (G, S) results in the state (G', S') then 
the execution of Pr on (G, S) will also result in the state (£?', S'). 

• Rule lA(i): In Rule lA(i), both Pl and Pr do not modify the program state (as 
we use a call by value semantics and there are no global variables). 

Consider the multiset S = IT£ .„ . with which qb is invoked. Let S' be the 
multiset of tuples constructed from parameters passed to each invocation of q inside 
the loop. We can see that S is multiset equivalent to S'. Now the equivalence of the 
two program fragments follows directly from the definition of batch-safe operation 
(when an operation is batch-safe the final system state depends only on the set of 
parameters and not the order of invocations). 

• Rule lA(ii): Proof is similar to that of Rule lA(i). 

• Rule IB: The only program state Pl and Pr modify is the table r. Let the initial 

state (state at the point where Pl/Pr begins) of the table be r init . Let r' be the 
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state the table reaches if Pl is executed and r" be the state the table reaches if Pr 
is executed. We show r' and r" to be multiset equivalent. 

Since q is a scalar query, from the definition of batched forms it follows that, during 
merge, each tuple in r init matches with exactly one tuple in the result of the batched 
invocation, qb(IL CTl ,..., Crrn ('r))- As a result, (i) cardinalities of r' , r" and r init are equal, 
i.e., \r'\ = \r"\ = \r init \ and (ii) For each tuple t G r init , there exists a distinct tuple 
t' G r' and a distinct tuple t" G r" such that t, t' and t" have the same values for all 
attributes except (possibly) the updated attributes viz., c wl , c w2 , ■ ■ ■ , c wn . 

Let t r be the tuple in the result of the batched invocation (qb) that matches (dur- 
ing merge) with tuple t of r init . Let (vi, v 2 , ■ ■ ■ , v m ) be the values of attributes 
c r i, c r2 , . • • , c rm of t. Therefore, attributes c r ±, c r2 , ■ ■ ■ , c rm of t r must also have the 
values (vi,v 2 , ■ ■ ■ ,v m ). Let (wi, w 2 , ■ ■ ■ , w n ) be the values of the remaining attributes 
(named c w y, c W 2>, ■ ■ ■ , c wn r) of t r . From the definition of batched forms, we have 
(wi, W2, ■ ■ ■ , w n ) = q{y i,v 2 , ... ,v m ). i.e., the tuple resulting from the merge (t" 
in r") has values assigned from q(vi, v 2 , ■ ■ ■ , v m ) for its attributes c wl , c w2 , ■ ■ ■ , c wn . 
From the LHS of the rule, it is clear that the corresponding tuple t' G r' also has the 
values of q(vi, v 2 , . . . , v m ) assigned for its attributes c w i, c w2 , . . . , c wn . This makes 
t' = t" and hence r' = r". 

Since q is a pure function the system state remains unaffected by both P L and P R . 

• Rule 1C: Proof is similar to the proof for IB. 

• Rule ID: The equivalence directly follows from the definition of batch-safe opera- 
tion. 

• Rule 2: Let Pl be the program fragment matching the LHS of the rule and Pr be 
the program fragment instantiated by the RHS. Let us call the while loop in Pl as 
L, the first {while) loop of Pr, which contains ss[, as L x , the cursor loop after that, 
which contains s', as L 2 , and the last cursor loop of Pr, which contains ss 2 , as L 3 . 

First, we note that there exists a one-to-one correspondence between statements 
in ss\ of L and statements in ss[ of L 1 . The correspondence follows from the 
construction of ss[. For every statement s x in ss±, the corresponding statement s' x 
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in ss[ performs exactly the same set of operations. 

Let v be the value of a variable a at statement s x in ssi in the i th iteration of L. As 
there are no loop-carried flow dependence edges crossing the split boundaries, it is 
evident that the statement that assigns value v to variable a must also appear in L 1 
or must precede L\. Therefore, we can see that v will be the value of a at s' x (the 
statement corresponding to s x ) in the i th iteration of L\. 

Similarly, we can see that values read by statement s' in the i th iteration of L 2 are 
same as the value read by statement s in the corresponding iteration of L. 

Now, consider a statement s y in ss 2 of loop L. Let v be the value of a variable a 
read by s y in the i th iteration of L. We now prove by induction on % that v will be 
the value read by the corresponding statement s' y in L 3 . 

Let i = l (the first iteration). In this case, the assignment of v to a must be 
performed by a statement that precedes s y in L. Therefore, we can observe that in 
the i th iteration of L 3 , either a statement in ss r or a statement that precedes s' y in 
ss 2 will assign v to a. And hence, s' y will read the value v from a. 

Now consider the k th iteration (i — k). In this case, s y will read the value of a which 
is either assigned by a preceding statement in the k th iteration of L, or a the value 
of a, which was present at the end of previous ((k — l) th ) iteration. In the former 
case, we will again have a statement that precedes s' y , which assigns value v to a. In 
the later case, we know by the induction hypothesis that the value of a at the end of 
(k — l) th iteration for both L and L 3 will be the same. Since a was not assigned in 
the k th iteration by any statement preceding s y , no statement in ss r will overwrite 
a. And hence, we see that v will be the value read by s' y in the k th iteration. 

Since there are no inter-statement dependencies involving external system state, the 
output and change in system state produced by any statement s x and the corre- 
sponding statement s' x will be the same. Further, for every non local variable gv, if 
the last assignment in Pl was made by a statement s x in the i th iteration of L, then 
in Pr the last assignment will be made by the corresponding statement s' x in the 
i th iteration of L l5 L 2 or L 2 . The only additional change introduced by Pr to the 
program state is the new variable loopkey, which is not used after the point where 
Pr ends and hence does not affect the program. Hence, the execution of both the 
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Pl and Pr result in equivalent program and system states. 

• Rules 3, 4 and 5: The equivalence of these rules is straight forward to infer. 

• Rule 6 A: As in the proof for Rule 1A, we can observe that the multiset of pa- 
rameters passed to qb in Pr is equivalent to the multiset of parameters passed over 
all the iterations of P L . Hence, from the definition of the batch-safe operation, the 
equivalence holds. 

• Rule 6B: The proof is similar to that of Rule IB. 
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Appendix C 



Additional Transformation Rules 



In Chapter we mentioned that our program transformation techniques can be extended 
to build a single dataflow for queries with UDF invocations, and thereby avoid mate- 
rialization of parameter batches. In this section, we present a few additional program 
transformation rules, which are useful for future work in this direction. 

Rule 7 in Figure ICTTl and Rule 8 in Figure 1(121 are useful in replacing loops containing 
simple expressions and assignment with relational operators and avoiding materialization 
of intermediate results. 

Batched forms of procedures are relation valued, i.e., they return sets of tuples. For 
example, the batched form of Figure 14.11 constructs the table to be returned iteratively. 
Rule 9 in Figure IC.3I is useful to convert such code into a set valued expression. The 
example in Appendix [D] illustrates the use of Rule 9. 

Rule 7 

for each t by ref in r [order by key] loop 

t.b = arith-exprit.ax, t.a 2 , . . . , t.a n ); 
end loop; 

t 

f ^A,arith-expr(t.ai,t.a2,...,t.a n ) as &('")> 

where A=schema(r)—{b} 

Figure C.l: Loops with Arithmetic Expressions and Assignment 
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Rule 8 



Let exprl be a side-effect free expression (e.g., a query). 

table rl = exprl (); 

table r2 = expr2(rl); 

deadijl) / / rl unused hereafter 

t 

table r2 = expr2(exprl); 

Figure C.2: Rule for Avoiding Materialization 

Rule 9 

table result; 

for each t [by ref] in r [order by key] loop 

result. addRecord((ci, c 2 , . . . , c„)); 

/ / where each q is a function of attributes of t. 
end loop; 
return result; 

t 

return select ci, C2, . . . , c n from r; 

Figure C.3: Rule for return Statement 

We omit a formal proof of correctness for these transformation rules, as their correctness 
is straight-forward to infer. 
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Appendix D 



Transformation Examples 



In this section, we illustrate the transformation of the UDFs in Example 11.31 and Exam- 
ple 14.11 as the rules get applied following the batching procedure in Figure 14.81 We call 
these two UDFs as UDF-1 and UDF-2 respectively. Here, we assume that every query 
needs to be batched (when possible) and with respect to all the loops enclosing it. 



Rewriting UDF-1 

• Generate the Trivial Batched Form: First, we generate the trivial batched 
form of the procedure as explained in Section 14.1.31 Figure ID. II shows the resulting 
procedure. 

• Isolate the Query Execution: We isolate the query (expression) to be batched 
using Rule-3 and convert the control-dependencies to flow-dependencies using Rule- 
4. The resulting procedure after applying these two rules is shown in Figure ID. 21 

• Split the Loop: Split the loop (by applying Rule-2) before and after the query 
execution statements. In this example, the pre-conditions for Rule-2 are directly 
satisfied and we do not need to reorder any statements. However, in some cases we 
may need to reorder the statements using Rule-5 to satisfy the pre-conditions for 
Rule-2. Figure lD~3l shows the resulting program. 

• Replace Loops with Batched Calls: Apply Rule-ID to remove the order-by 
clauses around batch-safe operations and then replace the iterations with batched 
calls using Rules IB and 1C. We further apply Rule 9 (Figure ICL3]) for the RETURN 

135 



TABLE count-offers-batched(TABLE rl) 
DECLARE 

TABLE result; 
BEGIN 

FOR EACH t IN rl LOOP 
FLOAT amount-usd; 

INT count-offers; // The return value named after the function 
IF (t.curcode == "USD") 

amount-usd := t. amount; 
ELSE 

amount-usd := t. amount * (SELECT exchrate FROM curexch 

WHERE ccode = t.curcode); 

END IF 

count-offers := SELECT count(*) FROM buyoffers 

WHERE itemid = t.itemcode AND price >= amount-usd; 
result. addRecord((t.itemcode, t. amount, t.curcode, count-offers)); 
END LOOP; 
RETURN result; 
END; 

Figure D.l: UDF-1: Trivial Batched Form 



TABLE count-offers-batched(TABLE rl) 
DECLARE 

TABLE result; 
BEGIN 

FOR EACH t IN rl LOOP 

FLOAT amount-usd; INT count-offers; 

BOOLEAN condl; FLOAT exchrate; 

condl := (t.curcode == "USD"); 

condl == true? amount-usd := t. amount; 

condl == false? exchrate := SELECT exchrate FROM curexch 

WHERE ccode = t.curcode; 
condl == false? amount-usd := t. amount * exchrate; 
count-offers := SELECT count(*) FROM buyoffers 

WHERE itemid = t.itemcode AND price >= amount-usd; 
result. addRecord((t.itemcode, t. amount, t.curcode, count-offers)); 
END LOOP; 
RETURN result; 
END; 

Figure D.2: UDF-1: After Applying Rules 3 and 4 



statement. The resulting program is given in Figure ID.4I Earlier, in Example 
we had shown this batched form with minor simplifications for readability. 



136 



TABLE count-offers-batched(TABLE rl) 
DECLARE 

TABLE result; INT loopkey = 0; 

TABLE (key, itemcode, amount, curcode, condl, exchrate, amount-usd, count-offers) r2; 
BEGIN 

FOR EACH t IN rl LOOP 

FLOAT amount-usd; BOOLEAN condl; 

RECORD rec; 

condl := (t. curcode == "USD"); 
condl == true? amount-usd := t. amount; 
rec. key = loopkey++; 
rec. itemcode = t. itemcode; 
rec. amount = t. amount; 
rec. curcode = t. curcode; 
rec. condl = condl; 
rec. amount-usd = amount-usd; 
r2.addRecord(rec); 
END LOOP; 

FOR EACH t BY REF IN r2 ORDER BY key LOOP 
t. condl == false? t. exchrate := 

SELECT exchrate FROM curexch WHERE ccode = t. curcode; 

END LOOP; 

FOR EACH t BY REF IN r2 ORDER BY key LOOP 

t. condl == false? t. amount-usd := t. amount * t. exchrate; 
END LOOP; 

FOR EACH t BY REF IN r2 ORDER BY key LOOP 

t.count-offers := SELECT count(*) FROM buyoffers 

WHERE itemid = t. itemcode AND price >= t. amount-usd; 

END LOOP; 

FOR EACH t BY REF IN r2 ORDER BY key LOOP 

result. addRecord((t. itemcode, t. amount, t. curcode, t.count-offers)); 
END LOOP; 

RETURN result; 
END; 

Figure D.3: UDF-1: After Loop Split 



Rewriting UDF-2 

UDF-2 (Example 14.11) contains two queries, one in statement s8 and the other in statement 
slO. As mentioned in Section I4.5[ we cannot batch the query in statement slO due to 
cyclic flow dependence. However, we can batch the query in statement s8 with respect to 
the WHILE loop (of s5) as well as the outermost cursor loop, which iterates over all the 
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TABLE count-offers-batched(TABLE rl) 
DECLARE 

TABLE (key, itemcode, amount, curcode, condl, exchrate, amount-usd, count-offers) r2; 
INT loopkey = 0; 
BEGIN 

FOR EACH t IN rl LOOP 

FLOAT amount-usd; BOOLEAN condl; 

RECORD rec; 

condl := (t. curcode == "USD"); 
condl == true? amount-usd := t. amount; 
rec. key = loopkey++; 
rec. itemcode = t. itemcode; 
rec. amount = t. amount; 
rec. curcode = t. curcode; 
rec. condl = condl; 
rec. amount-usd = amount-usd; 
r2.addRecord(rec); 
END LOOP; 

MERGE INTO r2 USING qlb(bl) AS qlb ON (r2.curcode = qlb.curcode) 
WHEN MATCHED THEN UPDATE SET exchrate = qlb.exchrate; 

// where the parameter batch bl is constructed as: 

// SELECT distinct curcode FROM r2 WHERE condl=false; 

// and the batched form qlb(bl) is defined as: 

// SELECT bl. curcode, c. exchrate FROM bl JOIN curexch c ON bl.curcode=c.ccode; 

FOR EACH t BY REF IN r2 ORDER BY key LOOP 

t. condl == false? t. amount-usd := t. amount * t. exchrate; 
END LOOP; 

MERGE INTO r2 USING q2b(b2) AS q2b 

ON (r2.itemcode=q2b. itemcode AND r2.amount-usd=q2b. amount-usd) 
WHEN MATCHED THEN UPDATE SET count-offers = q2b.count-offers; 

where b2 = SELECT distinct itemcode, amount-usd FROM r2; 

and q2b(b2) = SELECT b2. itemcode, b2. amount-usd, count(o. itemcode) AS count-offers 
FROM b2 LEFT OUTER JOIN buyoffers o ON o.itemid = b2. itemcode AND 

o. price >= b2. amount-usd 

GROUP BY b2. itemcode, b2. amount-usd; 

RETURN SELECT itemcode, curcode, amount, count-offers FROM r2; 
END; 

Figure D.4: UDF-1: The Final Batched Form 



parameters (in the trivial batched form). 

In Example 14.11 observe that splitting the WHILE loop (of s5) before and after 
the query execution statement (of s8) is not directly possible due to the loop-carried 
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dependencies from sll and sl2 to s5, s6 and s7, which violate pre-condition cl of Rule-2. 
We therefore, reorder of statements by moving statements s8 and s9 past sl2 (using Rule- 
5). We then split the WHILE loop and batch the query execution. The batched query 
execution is further pulled out of the outermost cursor loop in the trivial batched form 
using Rule-6. 

Figure ES] shows the final batched form of UDF-2. The functions NEST and UNNEST 
implement the nest and unnest operations discussed in Section 14.3.61 and take the corre- 
sponding arguments. NEST takes as its arguments the table, columns to be nested and 
the name for the resulting table-valued column. Similarly, the UNNEST method takes the 
table and the name of the table-valued column that needs to be unnested. 
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TABLE count-items-batched(TABLE pb) 
DECLARE 

TABLE (key, catid, loop-table2, totalcount) loop-tablel; 
INT loopkeyl = 0; 
BEGIN 

FOR EACH t IN pb LOOP 

INT totalcount := 0; INT top := 0; INT stack[100]; RECORD reel; 
stack[top] := t. catid; 
top := top + 1; 

TABLE (key, curcat, catitems) loop-table2; 
int loopkey2 = 0; 
WHILE top > LOOP 

RECORD rec2; 

top := top - 1; 

curcat := stack[top]; 

// Now push all the subcategory ids onto the stack 
FOR catrec IN SELECT category-id FROM category 
WHERE parent-category=curcat LOOP 

stack[top] := catrec. category-id; 

top := top + 1; 
END LOOP; 

rec2.key = loopkey2++; rec2. curcat = curcat; loop-table2.addRecord(rec2); 
END LOOP; 

reel. key = loopkeyl++; reel. catid = t. catid; recl.loop-table2 = loop-table2; 
loop-tablel. add Record (reel); 
END LOOP; 

temp = UNNEST(loop-tablel, " loop-table2" ); 

MERGE INTO temp USING qb(b) AS qbr(curcat, res) ON temp. curcat = qbr.curcat 
WHEN MATCHED THEN UPDATE SET catitems = res; 

/ / where the parameter batch b is constructed as: 

// SELECT distinct curcat FROM temp; 

// and the batched form qb(b) is defined as: 

// SELECT b. curcat, count(itemid) AS catitems 

// FROM b LEFT OUTER JOIN item ON category- id =curcat GROUP BY curcat; 

loop-tablel = NEST(temp, schema(loop-tablel.loop-table2), " loop-table2" ); 

FOR EACH reel BY REF IN loop-tablel ORDER BY key LOOP 

FOR EACH rec2 BY REF IN recl.loop-table2 ORDER BY key LOOP 

reel. totalcount := reel. totalcount + rec2. catitems; 
END LOOP; 

END LOOP; 

RETURN SELECT catid, totalcount FROM loop-tablel; 
END; 

Figure D.5: UDF-2: The Final Batched Form 
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Appendix E 



Procedures Used in Experiments 



This section gives pseudocode for the additional procedures used for performance evalua- 
tion in Section 14.61 Figure IE. II shows the procedure for Experiment-3 and the procedure 
for Experiment-2 is given in Figure IE.21 The functionality implemented by these proce- 
dures was explained earlier, in Section 14.61 

PROCEDURE expand-issued-forms(DATE issuedate) 
DECLARE 

INT num; 
BEGIN 

FOR EACH irec IN SELECT agentJd, start_no, end_no, issue_date 

FROM issuecLforms WHERE issue.date = issuedate LOOP 
num := irec.start_no; 
WHILE (num <= irec.end.no) LOOP 

INSERT INTO forms-master VALUES (num, irec.agentJd, irec.issue.date, 'NEW'); 
num := num + 1; 
END LOOP; 
END LOOP; 
END; 

Figure E.l: Procedure for Experiment 3 
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PROCEDURE emp-upload(VARCHAR filename) 
DECLARE 

// Data types of local variables omitted for brevity, 
empid, clientid, iempid, optcode, optinfo, termcode, taxinfo, 
city, state, zip, operation, curtaxinfo, curcity, curstate, curzip 
BEGIN 

fd := open(filename); 
linestr := readline(fd); 
WHILE (linestr ! = null) LOOP 

tokenize linestr and extract empid, clientid, optcode, ... zip 

// some validation and pre-processing code 

if(optcode == 0) 
optinfo = ... ; 

SELECT internl-empid into iempid, tax-info into curtaxinfo, 
FROM options WHERE client-id=clientid AND emp-id=empid; 

//If options has no record for the employee 
if(iempid == null) { 

operation := 1; // we must insert 

iempid := gen-new-id(); 

} 

else { 

operation := 2; // we must update 

SELECT city into curcity, state into curstate, zip into curzip 
FROM contactinfo WHERE internal-empid=iempid; 

// Retain the current values if new ones are blank 
if(taxinfo == "") 

taxinfo := curtaxinfo; 
if(city == "") 

city := curcity; 

} 

if(operation == 1) { 

INSERT INTO options VALUES (iempid, clientid, ... optinfo, ... taxinfo); 
INSERT INTO contactinfo VALUES(iempid, city, state, zip); 

} 

else { 

UPDATE options set option-info=optinfo, . . . tax-info=taxinfo 
WHERE internal-empid=iempid; 

UPDATE contactinfo SET city=city, state=state, zip=zip 
WHERE internal-empid=iempid; 

} 

linestr := readline(fd); 
END LOOP; 
END; 



Figure E.2: Procedure for Experiment 2 
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Appendix F 

API and Code Patterns 



As mentioned in Section 14.61 we implemented the transformation rules for Java because 
tools for Java program analysis are available in public domain. We make use of the SOOT 
optimization framework for obtaining data dependence information. To simplify the task 
of recognizing query execution statements and code patterns that match a rule, our current 
implementation requires that queries and updates be performed using our API layer built 
on top of JDBC During rewrite, we recognize these calls and transform them for batched 
bindings when possible. In this section, we give the details of our API layer and the Java 
code patterns, which map to constructs described earlier, in this thesis. SOOT uses an 
intermediate code representation called Jimple. Our implementation works on Jimple and 
transforms it back to Java. Recognizing Jimple code patterns corresponding to each of 
our API calls and Java code patters is relatively straight forward and we omit the details. 

• Query /Update Execution Statements: The class DBI in our implementation 
provides various methods for executing queries and updates. 

— Record executeScalar Query (int queryld, Record params) 

— Table executeQuery(int queryld, Record params) 

— int execute Update (int queryld, Record params) 

The queryld specifies a parameterized SQL query or update statement in a query 
registry. Unlike JDBC, where the query string is directly specified in the program, 
we make use of a registry. The query registry, in addition to the query string, also 
contains the manually written batched form for the query. Unlike position-based 
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parameters in JDBC, we use named parameters. The Record class is used to pass 
parameters as name-value pairs and also to obtain the result of a scalar query. The 
class Table implements a tuple set and is used to retrieve query results, and for 
constructing parameter batches. 

Table res = DBI.executeQuery(. . .); 
Iterator reslter = res.iterator(); 
while(reslter.hasNext()) { 

Record r = (Record) reslter. next(); 

} 

Figure F.l: Pattern for Cursor Loops 

• Looping Statements: while loops have a direct mapping to Java. Unlike some 
of the procedural languages offered by database systems (e.g., PL/SQL), Java does 
not have cursor loops. Each cursor loop maps to a sequence of statements in Java. 
Figure IF. II shows the code pattern. 

• Batched Execution: The class Table in our implementation can be used for con- 
structing parameter batches row by row using the method addRecord(Record r). The 
DBI class has methods for executing batched forms of queries by passing a batch of 
parameters and also for merging back the results. These methods are used by the 
transformed program (generated code). 

— Table executeBatchedQueryfint queryld, Table paramBatch, FilterPred pred) 

— void executeBatchedUpdatefint queryld, Table paramBatch, FilterPred pred) 

• Other Constructs: Control flow, assignment and other constructs in the sim- 
ple procedural language used in this thesis have a direct mapping to Java. Con- 
ditional statements, generated when control-dependencies are converted to flow- 
dependencies, are Java if statements, where the if-block contains a single statement 
and the predicate is a boolean variable. 



for each r in query 
end loop; 
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